Download fails with 400, message='Can not decode content-encoding: gzip'
wtbarnes opened this issue · comments
When trying to download a (seemingly) plain text file, the download fails and I'm getting the following error,
[<parfive.results.Error object at 0x1078a66d0>
https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt
400, message='Can not decode content-encoding: gzip']
The code to reproduce this is,
import parfive
dl = parfive.Downloader()
dl.enqueue_file('https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt', path='.')
foo = dl.download()
parfive version: 2.0.2
Python version: 3.9
OS: macOS 12.6.2
Solution detailed here: #121 (comment)
Can you give me a full print out of the http response headers? (Either with httpie/curl or debug logging?)
If it helps, using aiohttp directly works:
import aiohttp
import asyncio
async def main():
async with aiohttp.ClientSession() as session:
async with session.get('https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V9_20200706_215452_response_table.txt') as response
print("Status:", response.status)
print([f"{key}: {response.headers[key]}" for key in response.headers])
html = await response.text()
print("Body:", html, "...")
asyncio.run(main())
'Date: Tue, 10 Jan 2023 20:19:41 GMT'
'Server: Apache'
'Strict-Transport-Security: max-age=31536000; includeSubdomains;'
'Last-Modified: Tue, 06 Jul 2021 21:57:00 GMT'
'Etag: "4d58-5c67b8056c300-gzip"'
'Accept-Ranges: bytes'
'Vary: Accept-Encoding'
'Content-Encoding: gzip'
'Content-Length: 2156'
'Content-Type: text/plain'
Body: DATE T_START T_STOP VER_NUM WAVE_STR WAVELNTH EPERDN DNPERPHT EFF_AREA EFF_WVLN EFFA_P1 EFFA_P2 EFFA_P3 RMSE
$ curl --head https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt
HTTP/1.1 200 OK
Date: Tue, 10 Jan 2023 21:43:43 GMT
Server: Apache
Strict-Transport-Security: max-age=31536000; includeSubdomains;
Last-Modified: Thu, 27 Sep 2012 13:22:00 GMT
ETag: "72d-4caaed300ae00"
Accept-Ranges: bytes
Content-Length: 1837
Vary: Accept-Encoding
Content-Type: text/plain
The filename in my original post was actually not the right one (fixed now), but I believe both show the same issue.
Here's the curl request for a gzip-ed response, and piped to gunzip to decompress the response:
$ curl -v -H 'Accept-encoding: gzip' https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt | gunzip -
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 198.118.248.247:443...
* Connected to sohoftp.nascom.nasa.gov (198.118.248.247) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.2 (IN), TLS header, Finished (20):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [25 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2682 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.2 (OUT), TLS header, Finished (20):
} [5 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: CN=sohoftp.nascom.nasa.gov
* start date: Dec 6 22:29:40 2022 GMT
* expire date: Mar 6 22:29:39 2023 GMT
* subjectAltName: host "sohoftp.nascom.nasa.gov" matched cert's "sohoftp.nascom.nasa.gov"
* issuer: C=US; O=Let's Encrypt; CN=R3
* SSL certificate verify ok.
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
> GET /solarsoft/sdo/aia/response/aia_V3_error_table.txt HTTP/1.1
> Host: sohoftp.nascom.nasa.gov
> User-Agent: curl/7.81.0
> Accept: */*
> Accept-encoding: gzip
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [57 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [57 bytes data]
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Wed, 11 Jan 2023 14:42:18 GMT
< Server: Apache
< Strict-Transport-Security: max-age=31536000; includeSubdomains;
< Last-Modified: Thu, 27 Sep 2012 13:22:00 GMT
< ETag: "72d-4caaed300ae00-gzip"
< Accept-Ranges: bytes
< Vary: Accept-Encoding
< Content-Encoding: gzip
< Content-Length: 356
< Content-Type: text/plain
<
{ [356 bytes data]
100 356 100 356 0 0 1096 0 --:--:-- --:--:-- --:--:-- 1098
* Connection #0 to host sohoftp.nascom.nasa.gov left intact
DATE T_START T_STOP VER_NUM WAVE_STR WAVELNTH DNPERPHT COMPRESS CALERR CHIANTI EVEERR
2012-02-10T01:12:01.000 2010-05-01T00:00:00.000 2050-05-01T00:00:00.000 3 94_THIN 94 1.975 26.196 0.250 0.500 0.087
2012-02-10T01:12:01.000 2010-05-01T00:00:00.000 2050-05-01T00:00:00.000 3 131_THIN 131 1.473 18.797 0.250 0.500 0.051
2012-02-10T01:12:01.000 2010-05-01T00:00:00.000 2050-05-01T00:00:00.000 3 171_THIN 171 1.122 14.400 0.250 0.250 0.019
2012-02-10T01:12:01.000 2010-05-01T00:00:00.000 2050-05-01T00:00:00.000 3 193_THIN 193 0.962 12.759 0.250 0.250 0.014
2012-02-10T01:12:01.000 2010-05-01T00:00:00.000 2050-05-01T00:00:00.000 3 211_THIN 211 0.880 11.670 0.250 0.250 0.019
2012-02-10T01:12:01.000 2010-05-01T00:00:00.000 2050-05-01T00:00:00.000 3 304_THIN 304 0.611 8.100 0.250 0.500 0.023
2012-02-10T01:12:01.000 2010-05-01T00:00:00.000 2050-05-01T00:00:00.000 3 335_THIN 335 0.576 7.350 0.250 0.250 0.097
2012-02-10T01:12:01.000 2010-05-01T00:00:00.000 2050-05-01T00:00:00.000 3 1600 1600 0.120 1.539 0.500 1.000 0.012
2012-02-10T01:12:01.000 2010-05-01T00:00:00.000 2050-05-01T00:00:00.000 3 1700 1700 0.113 0.362 0.500 1.000 0.035
2012-02-10T01:12:01.000 2010-05-01T00:00:00.000 2050-05-01T00:00:00.000 3 4500 4500 0.056 0.068 0.500 1.000 0.030
That is, the server appears to be returning a valid gzip-ed response, which is consistent with the fact that aiohttp by itself appears to have no problem.
Now that I've added a stream handler to the parfive logger so that I can actually see the parfive debug logging, it's clearer what is going on. The parfive error is related to the fact that parfive is splitting this already tiny file into super-tiny 72-byte requests. There is no error if you explicitly specify max_splits=1
when instantiating the downloader, which is equivalent to the lack of splitting when using curl or the simple aiohttp example above.
Here's example parfive debug output:
GET request made to https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt with headers=<CIMultiDictProxy('Host': 'sohoftp.nascom.nasa.gov', 'User-Agent': 'parfive/2.0.2 aiohttp/3.8.3 python/3.10.6', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate')>
200 Response received from https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt with headers=<CIMultiDictProxy('Date': 'Wed, 11 Jan 2023 15:30:19 GMT', 'Server': 'Apache', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains;', 'Last-Modified': 'Thu, 27 Sep 2012 13:22:00 GMT', 'Etag': '"72d-4caaed300ae00-gzip"', 'Accept-Ranges': 'bytes', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip', 'Content-Length': '356', 'Content-Type': 'text/plain')>
GET request made for download to https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt with headers=<CIMultiDictProxy('Host': 'sohoftp.nascom.nasa.gov', 'User-Agent': 'parfive/2.0.2 aiohttp/3.8.3 python/3.10.6', 'Range': 'bytes=0-71', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate')>
206 Response received from https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt with headers=<CIMultiDictProxy('Date': 'Wed, 11 Jan 2023 15:30:19 GMT', 'Server': 'Apache', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains;', 'Last-Modified': 'Thu, 27 Sep 2012 13:22:00 GMT', 'Etag': '"72d-4caaed300ae00-gzip"', 'Accept-Ranges': 'bytes', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip', 'Content-Range': 'bytes 0-71/356', 'Content-Length': '72', 'Content-Type': 'text/plain')>
GET request made for download to https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt with headers=<CIMultiDictProxy('Host': 'sohoftp.nascom.nasa.gov', 'User-Agent': 'parfive/2.0.2 aiohttp/3.8.3 python/3.10.6', 'Range': 'bytes=213-284', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate')>
206 Response received from https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt with headers=<CIMultiDictProxy('Date': 'Wed, 11 Jan 2023 15:30:19 GMT', 'Server': 'Apache', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains;', 'Last-Modified': 'Thu, 27 Sep 2012 13:22:00 GMT', 'Etag': '"72d-4caaed300ae00-gzip"', 'Accept-Ranges': 'bytes', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip', 'Content-Range': 'bytes 213-284/356', 'Content-Length': '72', 'Content-Type': 'text/plain')>
1/0 files failed to download. Please check `.errors` for details
https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt failed to download with exception
400, message='Can not decode content-encoding: gzip'
Note that it appeared to error on just the second 72-byte chunk here. Seemingly sometimes that chunk succeeds, and a different chunk errors. I'm not sure why some chunks fail sometimes.
Ah, it's actually failing on any chunk other than the first one. Here's what I believe is happening, and it traces back to aiohttp.
Since Content-Type
is text/plain
, and Accept-Encoding
includes gzip
, the web server will actually gzip-compress the text file before complying with any request for specific bytes. parfive determines the byte chunks based on Content-Length
, which is length of the compressed data, so when parfive sends the request for bytes 0–71 of the file, it actually gets bytes 0–71 of the compressed file, rather than a compressed version of bytes 0–71 of the original file.
Setting aside the fact that the bytes aren't the same, the problem is that each individual chunk is not a complete gzip-compressed payload, but rather just a partial segment of one. However, aiohttp sees Content-Encoding
is gzip
for each 72-byte chunk and will try to uncompress each chunk separately. For the first chunk, it will somewhat succeed (but gunzip sees the file terminate abruptly). For every other chunk, since they lack the magic bytes at the start, gunzip outright fails.
Using curl to request bytes 0–71, then piped to gunzip:
$ curl -v -H 'Accept-encoding: gzip' -H 'Range: bytes=0-71' https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt | gunzip -
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 198.118.248.247:443...
* Connected to sohoftp.nascom.nasa.gov (198.118.248.247) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.2 (IN), TLS header, Finished (20):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [25 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2682 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.2 (OUT), TLS header, Finished (20):
} [5 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: CN=sohoftp.nascom.nasa.gov
* start date: Dec 6 22:29:40 2022 GMT
* expire date: Mar 6 22:29:39 2023 GMT
* subjectAltName: host "sohoftp.nascom.nasa.gov" matched cert's "sohoftp.nascom.nasa.gov"
* issuer: C=US; O=Let's Encrypt; CN=R3
* SSL certificate verify ok.
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
> GET /solarsoft/sdo/aia/response/aia_V3_error_table.txt HTTP/1.1
> Host: sohoftp.nascom.nasa.gov
> User-Agent: curl/7.81.0
> Accept: */*
> Accept-encoding: gzip
> Range: bytes=0-71
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [57 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [57 bytes data]
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* Mark bundle as not supporting multiuse
< HTTP/1.1 206 Partial Content
< Date: Wed, 11 Jan 2023 15:38:57 GMT
< Server: Apache
< Strict-Transport-Security: max-age=31536000; includeSubdomains;
< Last-Modified: Thu, 27 Sep 2012 13:22:00 GMT
< ETag: "72d-4caaed300ae00-gzip"
< Accept-Ranges: bytes
< Vary: Accept-Encoding
< Content-Encoding: gzip
< Content-Range: bytes 0-71/356
< Content-Length: 72
< Content-Type: text/plain
<
{ [72 bytes data]
100 72 100 72 0 0 1184 0 --:--:-- --:--:-- --:--:-- 1200
* Connection #0 to host sohoftp.nascom.nasa.gov left intact
DATE T_START
gzip: stdin: unexpected end of file
Using curl to request a later chunk, then piped to gunzip, to mimic parfive and aiohttp:
$ curl -v -H 'Accept-encoding: gzip' -H 'Range: bytes=142-213' https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt | gunzip -
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 198.118.248.247:443...
* Connected to sohoftp.nascom.nasa.gov (198.118.248.247) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.2 (IN), TLS header, Finished (20):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [25 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2682 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.2 (OUT), TLS header, Finished (20):
} [5 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: CN=sohoftp.nascom.nasa.gov
* start date: Dec 6 22:29:40 2022 GMT
* expire date: Mar 6 22:29:39 2023 GMT
* subjectAltName: host "sohoftp.nascom.nasa.gov" matched cert's "sohoftp.nascom.nasa.gov"
* issuer: C=US; O=Let's Encrypt; CN=R3
* SSL certificate verify ok.
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
> GET /solarsoft/sdo/aia/response/aia_V3_error_table.txt HTTP/1.1
> Host: sohoftp.nascom.nasa.gov
> User-Agent: curl/7.81.0
> Accept: */*
> Accept-encoding: gzip
> Range: bytes=142-213
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [57 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [57 bytes data]
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* Mark bundle as not supporting multiuse
< HTTP/1.1 206 Partial Content
< Date: Wed, 11 Jan 2023 15:39:10 GMT
< Server: Apache
< Strict-Transport-Security: max-age=31536000; includeSubdomains;
< Last-Modified: Thu, 27 Sep 2012 13:22:00 GMT
< ETag: "72d-4caaed300ae00-gzip"
< Accept-Ranges: bytes
< Vary: Accept-Encoding
< Content-Encoding: gzip
< Content-Range: bytes 142-213/356
< Content-Length: 72
< Content-Type: text/plain
<
{ [72 bytes data]
100 72 100 72 0 0 947 0 --:--:-- --:--:-- --:--:-- 960
* Connection #0 to host sohoftp.nascom.nasa.gov left intact
gzip: stdin: not in gzip format
Here's the modified version of @alasdairwilson's example to show that the problem is with aiohttp, not with parfive:
>>> import asyncio
>>> import aiohttp
>>>
>>> async def main():
... async with aiohttp.ClientSession(headers={'Range': 'bytes=142-213'}) as session:
... async with session.get('https://sohoftp.nascom.nasa.gov/solarsoft/sdo/aia/response/aia_V3_error_table.txt') as response:
... print("\n".join([f"{key}: {response.headers[key]}" for key in response.headers]))
... html = await response.text()
... print(f"Body:\n{html}")
...
>>> asyncio.run(main())
Date: Wed, 11 Jan 2023 21:13:24 GMT
Server: Apache
Strict-Transport-Security: max-age=31536000; includeSubdomains;
Last-Modified: Thu, 27 Sep 2012 13:22:00 GMT
Etag: "72d-4caaed300ae00-gzip"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Range: bytes 142-213/356
Content-Length: 72
Content-Type: text/plain
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\ayshih\AppData\Local\mambaforge\envs\test4\lib\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "C:\Users\ayshih\AppData\Local\mambaforge\envs\test4\lib\asyncio\base_events.py", line 646, in run_until_complete
return future.result()
File "<stdin>", line 5, in main
File "C:\Users\ayshih\AppData\Local\mambaforge\envs\test4\lib\site-packages\aiohttp\client_reqrep.py", line 1081, in text
await self.read()
File "C:\Users\ayshih\AppData\Local\mambaforge\envs\test4\lib\site-packages\aiohttp\client_reqrep.py", line 1037, in read
self._body = await self.content.read()
File "C:\Users\ayshih\AppData\Local\mambaforge\envs\test4\lib\site-packages\aiohttp\streams.py", line 349, in read
raise self._exception
aiohttp.client_exceptions.ClientPayloadError: 400, message='Can not decode content-encoding: gzip'
Okay, okay, okay. It is possible to turn off the automatic decompression of gzip-compressed responses by aiohttp by specifying (surprise) auto_decompress=False
. Thus, since parfive is the one that is insisting on partial downloads, it should be the one to fix this issue. parfive needs to set auto_decompress=False
, stitch together the partial responses via aiohttp into a single gzip-compressed response, and then decompress that single payload itself.
Does this also happen when you set max_splits=1
?
Just tried it out and that works (does not throw an exception).
The download succeeds with max_splits=1
. See my earlier comment.