kevin1024 / vcrpy

Automatically mock your HTTP interactions to simplify and speed up testing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[>=4.3.1] Chunked uploads are broken with urllib3 v2

sathieu opened this issue · comments

With vcrpy 5.0.0, when updating urllib3 from 1.26.16 to 2.0.3, chunked uploads are failing.

This is because the body appears to vcrpy unchunked:

(Pdb) self.cassette.requests[18].url == self._vcr_request.url
True
(Pdb) self.cassette.requests[18].body                         
b'45\r\nBlob bf9500cc6aa4fac1a75c958496c9e9caeceacb256113e28836fc5b3bd448f706\r\n0\r\n\r\n'
(Pdb) [c for c in self._vcr_request.body]
[b'Blob bf9500cc6aa4fac1a75c958496c9e9caeceacb256113e28836fc5b3bd448f706']
(Pdb) self._vcr_request.headers          
{'User-Agent': 'gitlabracadabra/1.13.0', 'Docker-Distribution-Api-Version': 'registry/2.0', 'Authorization': 'Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IlJJWTM6SUhGMjpZUlhVOlJGTVI6NEVRRDpJWVZDOjZMSzM6MzVJTDpGRFJGOklKUE46Q0MySjpMTEIzIiwidHlwIjoiSldUIn0.eyJhY2Nlc3MiOlt7InR5cGUiOiJyZXBvc2l0b3J5IiwibmFtZSI6InJvb3QvdGVzdF9yZWdpc3RyeS9vcGVyYXRvci1mcmFtZXdvcmsvb2xtIiwiYWN0aW9ucyI6WyJwdXNoIiwicHVsbCJdfV0sImp0aSI6ImM5ZGQ1YWQxLWQ0M2ItNDI0Yi1iZDUwLWU4NzJlYjYxZjYwNSIsImF1ZCI6ImNvbnRhaW5lcl9yZWdpc3RyeSIsInN1YiI6InJvb3QiLCJpc3MiOiJnaXRsYWItaXNzdWVyIiwiaWF0IjoxNjE3MTkzMjE5LCJuYmYiOjE2MTcxOTMyMTQsImV4cCI6MTYxNzE5MzUxOX0.b6PKFISAEPnDnfTKR8XVIHUOc6JYomUOB-u_w3QQIqF8gkb92n199zvFSwcpOZolcXjk1DvCfMtUkOvklzAgXyIwNcBx7t_o_SO4mCqWEnbfWxzhJKKSPm6YLAC0IoQbxCsPfA4CK7XEOCJU7eTLaCTqLvq6GvmpjtyLF-KgZV8eUV4zPDNYkhvUqcJXSQR_UHqQ4k32h8itBGYzfhykHSP2B2ZvbBDHLYmBlsDFtc7LJVUEEvKKhCR1VQppwgIoHwJwu7OR3R6I5_XO23NwBbPYvAQZC9twPBKNvoPfcToMzSF_keuAVodUgAyYMUHmnFPw5jX0-PcPt0L9j6a9rA', 'Cookie': 'perf_bar_enabled=true; experimentation_subject_id=eyJfcmFpbHMiOnsibWVzc2FnZSI6IkltSmpaRFZrTXpRMkxXWXlPR010TkRZek1DMWhNMll5TFRNMlpEYzNZVFJrWWpsa09DST0iLCJleHAiOm51bGwsInB1ciI6ImNvb2tpZS5leHBlcmltZW50YXRpb25fc3ViamVjdF9pZCJ9fQ%3D%3D--403d642cd6f4aaa4365664b26bec713ff089b313', 'Transfer-Encoding': 'chunked'}
(Pdb) self.cassette.requests[18].headers                     
{'Authorization': 'Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IlJJWTM6SUhGMjpZUlhVOlJGTVI6NEVRRDpJWVZDOjZMSzM6MzVJTDpGRFJGOklKUE46Q0MySjpMTEIzIiwidHlwIjoiSldUIn0.eyJhY2Nlc3MiOlt7InR5cGUiOiJyZXBvc2l0b3J5IiwibmFtZSI6InJvb3QvdGVzdF9yZWdpc3RyeS9vcGVyYXRvci1mcmFtZXdvcmsvb2xtIiwiYWN0aW9ucyI6WyJwdXNoIiwicHVsbCJdfV0sImp0aSI6ImM5ZGQ1YWQxLWQ0M2ItNDI0Yi1iZDUwLWU4NzJlYjYxZjYwNSIsImF1ZCI6ImNvbnRhaW5lcl9yZWdpc3RyeSIsInN1YiI6InJvb3QiLCJpc3MiOiJnaXRsYWItaXNzdWVyIiwiaWF0IjoxNjE3MTkzMjE5LCJuYmYiOjE2MTcxOTMyMTQsImV4cCI6MTYxNzE5MzUxOX0.b6PKFISAEPnDnfTKR8XVIHUOc6JYomUOB-u_w3QQIqF8gkb92n199zvFSwcpOZolcXjk1DvCfMtUkOvklzAgXyIwNcBx7t_o_SO4mCqWEnbfWxzhJKKSPm6YLAC0IoQbxCsPfA4CK7XEOCJU7eTLaCTqLvq6GvmpjtyLF-KgZV8eUV4zPDNYkhvUqcJXSQR_UHqQ4k32h8itBGYzfhykHSP2B2ZvbBDHLYmBlsDFtc7LJVUEEvKKhCR1VQppwgIoHwJwu7OR3R6I5_XO23NwBbPYvAQZC9twPBKNvoPfcToMzSF_keuAVodUgAyYMUHmnFPw5jX0-PcPt0L9j6a9rA', 'Cookie': 'experimentation_subject_id=eyJfcmFpbHMiOnsibWVzc2FnZSI6IkltSmpaRFZrTXpRMkxXWXlPR010TkRZek1DMWhNMll5TFRNMlpEYzNZVFJrWWpsa09DST0iLCJleHAiOm51bGwsInB1ciI6ImNvb2tpZS5leHBlcmltZW50YXRpb25fc3ViamVjdF9pZCJ9fQ%3D%3D--403d642cd6f4aaa4365664b26bec713ff089b313; perf_bar_enabled=true', 'Docker-Distribution-Api-Version': 'registry/2.0', 'Transfer-Encoding': 'chunked'}

Notes:

  • The fix is probably similar to #723.
  • The affected cassette is here
  • I use a custom body matcher, but I don't think this is relevant here (except that this could be a way to workaround)

Hi @sathieu, I have not been successful in reproducing this issue yet. Could you provide a minimal reproducer in Python with VCR.py 5.0.0 and chunked transfer that works with urllib3 v2 but not v2?


In case the log at https://gitlab.com/gitlabracadabra/gitlabracadabra/-/jobs/4563346377 disappears, I am attaching a full raw copy here: full_job.log

@hartwork Here is a minimal reproducer:

testcase.tar.gz

works with requirements-good.txt, fails with requirements-bad.txt.

Hi @sathieu thanks for these files, well done! 🙏

I confirm that this works with urllib3 v1 but not v2. My understanding is that the request matcher that compres request bodies does not consider these arguably equal bodies as equal.

Comparision of bodies is done as follows:

vcrpy/vcr/matchers.py

Lines 51 to 57 in 4f70152

def body(r1, r2):
transformer = _get_transformer(r1)
r2_transformer = _get_transformer(r2)
if transformer != r2_transformer:
transformer = _identity
if transformer(read_body(r1)) != transformer(read_body(r2)):
raise AssertionError

So both sides are transformed before comparison. Which transformer that is is looked up here…

vcrpy/vcr/matchers.py

Lines 99 to 104 in 4f70152

def _get_transformer(request):
for checker, transformer in _checker_transformer_pairs:
if checker(request.headers):
return transformer
else:
return _identity

…and it is fed by this block here:

vcrpy/vcr/matchers.py

Lines 85 to 92 in 4f70152

_checker_transformer_pairs = (
(
_header_checker("application/x-www-form-urlencoded"),
lambda body: urllib.parse.parse_qs(body.decode("ascii")),
),
(_header_checker("application/json"), _transform_json),
(lambda request: _xml_header_checker(request) and _xmlrpc_header_checker(request), xmlrpc.client.loads),
)

Now that we have that: Maybe one way to fix this is to add a new transformer there? I have a demo of something like that on a new branch issue-734-fix-body-matcher-for-chunked-requests now. Could you have a look? What do you think?

One more thing: method no_read that you added — where can I learn more about that and its purpose, is that some common API? Thanks!

@hartwork Thanks !

Adding a transformer would work, but the question is why the body is unchuncked with the Transfer-Encoding: chunked header still present? Maybe urllib3/urllib3#2565 (see also urllib3/urllib3#2291 (comment), especially If Content-Length can be determined, set Content-Length and proceed not chunked.)?

NB: this no_read method was a test (for the read method). It was not needed, and you can ignore it.

@sathieu thanks for your reply!

I have fixed multiple bugs in the demo in branch issue-734-fix-body-matcher-for-chunked-requests in the meantime.

Regarding the difference, with urllib3 v1 request.body is of type bytearray while with v2 it's of type Stream, i.e. your custom class.

@sathieu I'm at a point now where the code is ready for proper review and testing: it's new pull request #739. If someone has ideas for a simpler solution, happy to do something simpler.