Always encode your Requests payloads in Python
At Bixoto, we use a lot of different APIs to interface with suppliers and other services. Today, I was working with an XML API using requests
(via api_session
) and xmltodict
.
TL;DR: use requests.post(url, data=my_string.encode("utf-8"))
and not requests.post(url, data=my_string)
.
Long version below:
The simplified code looked like this:
from api_session import APISession import xmltodict class TheClient(APISession): def post_xml_api(self, path: str, payload: dict) -> dict: # Transform a dict into an XML string xml = xmltodict.unparse(payload) # POST it to the API response = self.post_api( path, data=xml, headers={"Content-Type": "application/xml; charset=utf-8"}, ) # Parse the response XML as a dict again response.encoding = response.apparent_encoding return xmltodict.parse(response.text) def hello(self, name: str) -> str: res = self.post_xml_api("/hello", {"name": name}) return res["message"] # ... client = TheClient(base_url="...") print(client.hello("John")) # => "Hello John!"
This worked great until I called client.hello()
with a name that contained accents, such as “Élise”. The API provider complained that it wasn’t receiving UTF-8 data.
To debug the API client, I set up a simple server using nc
in another terminal:
nc -l 1234
Then I used it as my base URL:
# note: this is a feature of api_session, not requests client = TheClient(base_url="http://localhost:1234") client.hello("Élise")
This is the result request:
POST /hello HTTP/1.1 Host: localhost:1234 User-Agent: python-requests/2.31.0 ... Content-Type: application/xml; charset=utf-8 Content-Length: 57 <?xml version="1.0" encoding="utf-8"?> <name>�lise</name>
There was indeed an issue with the encoding. I thought that Python used UTF-8 everywhere by default, but that’s not the case. The default charset for HTTP is ISO-8859-1, aka Latin-1 (see the RFC 2616).
Requests wraps Python’s http.client
, which respects that:
If body is a string, it is encoded as ISO-8859-1, the default for HTTP.
The solution is to explicitly encode the request body:
# Before response = requests.post(url, data=xml_string) # After response = requests.post(url, data=xml_string.encode("utf-8"))
That way, the body is already encoded and http.client
doesn’t have to encode it by itself.