Post requests with unicode chars in string as data sends incomplete/trimmed data (in >=2.30 with urllib3 2.0)

Question

Post requests with unicode chars in string as data sends incomplete/trimmed data (in >=2.30 with urllib3 2.0)

viiand opened this issue 3 months ago · comments

In the requests>=2.30.0 (with urllib3 2+), when you send a post request with data=<a string containing unicode chars>, the server receives incomplete/trimmed data. This is due to the fact content-length set by requests will not match the actually sent data. It is set to len(data), yet it sends data.encoded() - which is longer.

If requests uses urllib3 1.x versions this throws an exception, for example:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u0159' in position 10: Body ('ř') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

Expected Result

Either:
a) send complete data (where content-length header == len(data.encode()), not len(data) )
b) raise an exception as urlllib3 1.x versions did, suggesting the str should be utf-8 encoded
c) raise an exception telling the user he is not supposed to pass strings to the data argument

Actual Result

No error is reported on the client side, nor is the data sent correctly. Server reads incomplete data.

Reproduction Steps

import requests
d="Lidová tvořivost na sebe nenechala dlouho čekat."
requests.post('http://localhost:8083/echo', data=d)

Server receives Lidová tvořivost na sebe nenechala dlouho ček in the requests body, which is incomplete

System Information

$ python -m requests.help

{
  "chardet": {
    "version": null
  },
  "charset_normalizer": {
    "version": "2.0.12"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.6"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.11.8"
  },
  "platform": {
    "release": "22.6.0",
    "system": "Darwin"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.31.0"
  },
  "system_ssl": {
    "version": "300000d0"
  },
  "urllib3": {
    "version": "2.2.1"
  },
  "using_charset_normalizer": true,
  "using_pyopenssl": false
}

Nate Prewitt · Answer 1 · Fri Mar 08 2024 03:15:20 GMT+0800 (China Standard Time)

Hi @viiand, this is a duplicate of #6586 which is already fixed in #6589. Please check opened and closed issues before opening new ones. Thanks!