micropython / micropython-lib

Core Python libraries ported to MicroPython

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

aiohttp and requests Content-Length off if json data contains non-ascii characters

bixb922 opened this issue · comments

I'm sorry to bother again with aiohttp. Having an asyncio requests module with ssl support made me rewrite all code to take advantage of this excellent development. Thanks to you all!

When sending json data in a request with the json= parameter, and the json contains non-ascii characters (multi-byte characters), the Content-Length is less than it should be, and some information gets truncated.

This is with MicroPython v1.22.0 on 2023-12-27; Generic ESP32S3 module with Octal-SPIRAM with ESP32S3

I see this happening both with aiohttp and the requests (former urequests) module.

This is the test program:

import aiohttp
import asyncio
import requests
import json

async def main():
    do_connect()
    json_info = {"tildes": "áéíóúñ"}
    json_str = json.dumps( json_info )
    print(f"JSON length {len(json_str)} characters, {len(json_str.encode())} bytes")
    async with aiohttp.ClientSession("http://httpbin.org") as session:
        async with session.post("/post", json=json_info) as resp:
            assert resp.status == 200
            rpost = await resp.text()
            print(f"aiohttp POST: {rpost}")
    
    resp = requests.request( "POST", "http://httpbin.org/post", json=json_info)
    print(f"requests POST: {resp.text}")
    
    
asyncio.run(main())

The output is:

JSON length 20 characters, 26 bytes
aiohttp POST: {
  "args": {}, 
  "data": "{\"tildes\": \"\u00e1\u00e9\u00ed\u00f3", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Content-Length": "20", 
    "Content-Type": "application/json", 
    "Host": "httpbin.org", 
    "User-Agent": "compat", 
    "X-Amzn-Trace-Id": "Root=1-6595d395-0086a9e07313785a0a4b7d26"
  }, 
  "json": null, 
  "origin": "181.43.38.11", 
  "url": "http://httpbin.org/post"
}

requests POST: {
  "args": {}, 
  "data": "{\"tildes\": \"\u00e1\u00e9\u00ed\u00f3", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Content-Length": "20", 
    "Content-Type": "application/json", 
    "Host": "httpbin.org", 
    "X-Amzn-Trace-Id": "Root=1-6595d395-04fc454470e156273acf6038"
  }, 
  "json": null, 
  "origin": "181.43.38.11", 
  "url": "http://httpbin.org/post"
}

It seems that both modules use len(json as string) which counts characters, whereas Content-Length needs the length in bytes. And unlike CPython, MicroPython json.dumps converts to UTF-8, since there is no ensure_ascii parameter.

@bixb922 Thanks for the report, I see now that I was computing the length before encoding and I wasn't aware that this could change with non-ascii characters. But this is now fixed in #782, if you find anything else, let me know 👍🏼

Works well for me now, both binary data and multibyte json, thanks a lot!