Always encode your Requests payloads in Python

At Bixoto, we use a lot of different APIs to interface with suppliers and other services. Today, I was working with an XML API using requests (via api_session) and xmltodict.

TL;DR: use requests.post(url, data=my_string.encode("utf-8")) and not requests.post(url, data=my_string).
Long version below:

The simplified code looked like this:

from api_session import APISession
import xmltodict


class TheClient(APISession):
    def post_xml_api(self, path: str, payload: dict) -> dict:
        # Transform a dict into an XML string
        xml = xmltodict.unparse(payload)

        # POST it to the API
        response = self.post_api(
            path,
            data=xml,
            headers={"Content-Type": "application/xml; charset=utf-8"},
        )
        # Parse the response XML as a dict again
        response.encoding = response.apparent_encoding
        return xmltodict.parse(response.text)

    def hello(self, name: str) -> str:
        res = self.post_xml_api("/hello", {"name": name})
        return res["message"]


# ...
client = TheClient(base_url="...")
print(client.hello("John"))  # => "Hello John!"

This worked great until I called client.hello() with a name that contained accents, such as “Élise”. The API provider complained that it wasn’t receiving UTF-8 data.

To debug the API client, I set up a simple server using nc in another terminal:

nc -l 1234

Then I used it as my base URL:

# note: this is a feature of api_session, not requests
client = TheClient(base_url="http://localhost:1234")
client.hello("Élise")

This is the result request:

POST /hello HTTP/1.1
Host: localhost:1234
User-Agent: python-requests/2.31.0
...
Content-Type: application/xml; charset=utf-8
Content-Length: 57

<?xml version="1.0" encoding="utf-8"?>
<name>�lise</name>

There was indeed an issue with the encoding. I thought that Python used UTF-8 everywhere by default, but that’s not the case. The default charset for HTTP is ISO-8859-1, aka Latin-1 (see the RFC 2616).

Requests wraps Python’s http.client, which respects that:

If body is a string, it is encoded as ISO-8859-1, the default for HTTP.

The solution is to explicitly encode the request body:

# Before
response = requests.post(url, data=xml_string)
# After
response = requests.post(url, data=xml_string.encode("utf-8"))

That way, the body is already encoded and http.client doesn’t have to encode it by itself.