Post requests with unicode chars in string as data sends incomplete/trimmed data (in >=2.30 with urllib3 2.0)
viiand opened this issue · comments
In the requests>=2.30.0 (with urllib3 2+), when you send a post request with data=<a string containing unicode chars>
, the server receives incomplete/trimmed data. This is due to the fact content-length
set by requests will not match the actually sent data. It is set to len(data), yet it sends data.encoded() - which is longer.
If requests uses urllib3 1.x versions this throws an exception, for example:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0159' in position 10: Body ('ř') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
Expected Result
Either:
a) send complete data (where content-length header == len(data.encode()), not len(data) )
b) raise an exception as urlllib3 1.x versions did, suggesting the str should be utf-8 encoded
c) raise an exception telling the user he is not supposed to pass strings to the data
argument
Actual Result
No error is reported on the client side, nor is the data sent correctly. Server reads incomplete data.
Reproduction Steps
import requests
d="Lidová tvořivost na sebe nenechala dlouho čekat."
requests.post('http://localhost:8083/echo', data=d)
Server receives Lidová tvořivost na sebe nenechala dlouho ček
in the requests body, which is incomplete
System Information
$ python -m requests.help
{
"chardet": {
"version": null
},
"charset_normalizer": {
"version": "2.0.12"
},
"cryptography": {
"version": ""
},
"idna": {
"version": "3.6"
},
"implementation": {
"name": "CPython",
"version": "3.11.8"
},
"platform": {
"release": "22.6.0",
"system": "Darwin"
},
"pyOpenSSL": {
"openssl_version": "",
"version": null
},
"requests": {
"version": "2.31.0"
},
"system_ssl": {
"version": "300000d0"
},
"urllib3": {
"version": "2.2.1"
},
"using_charset_normalizer": true,
"using_pyopenssl": false
}