timhutton / twitter-archive-parser

Python code to parse a Twitter archive and output in various ways

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failing to download video because of "exception: 'content-length'

haikusw opened this issue · comments

commented

For example:

./parser-output/media/1599509571449819136-QgWxNNfL-_1MwecF.mp4: FAIL. Media couldn't be retrieved from https://video.twimg.com/ext_tw_video/1599509499609780224/pu/vid/720x1280/QgWxNNfL-_1MwecF.mp4?tag=12 because of exception: 'content-length'

When I do curl -I "https://video.twimg.com/ext_tw_video/1599509499609780224/pu/vid/720x1280/QgWxNNfL-_1MwecF.mp4?tag=12"

I get:

HTTP/2 200
access-control-allow-origin: *
access-control-expose-headers: Content-Length
cache-control: max-age=604800, must-revalidate
content-type: video/mp4
date: Sat, 17 Dec 2022 20:53:12 GMT
last-modified: Sun, 04 Dec 2022 21:00:36 GMT
perf: 7626143928
server: tsa_b
server-timing: x-cache;desc= MISS,x-tw-cdn;desc=VZ
surrogate-key: ext_tw_video ext_tw_video/bucket/4 ext_tw_video/1599509499609780224
timing-allow-origin: https://twitter.com, https://mobile.twitter.com
x-cache: MISS
x-connection-hash: de05f8a8f2a7ce41595bd83d4ff7ba9df5a1900542be14ede58132f0e4632e1b
x-content-type-options: nosniff
x-response-time: 150
x-transaction-id: 9a09bfab92915299
x-tw-cdn: VZ
x-tw-cdn: VZ
x-tw-cdn: VZ
content-length: 5999388

Seeing the same error for smaller files like this one:

% curl -I "https://video.twimg.com/ext_tw_video/1457248156870053890/pu/vid/740x678/0EAkSJin4P9CJ5u9.mp4?tag=12"

HTTP/2 200
accept-ranges: bytes
access-control-allow-origin: *
access-control-expose-headers: Content-Length
age: 217
cache-control: max-age=604800, must-revalidate
content-type: video/mp4
date: Sat, 17 Dec 2022 20:44:47 GMT
last-modified: Sun, 07 Nov 2021 07:24:49 GMT
perf: 7626143928
server: ECAcc (sec/9797)
server-timing: x-cache;desc= HIT,x-tw-cdn;desc=VZ
surrogate-key: ext_tw_video ext_tw_video/bucket/1 ext_tw_video/1457248156870053890
timing-allow-origin: https://twitter.com, https://mobile.twitter.com
x-cache: HIT
x-connection-hash: 33f5b752c5e4566a2f8f7bff495cb24b2a3cfb4ea9f77dce961304ca0992dd3c
x-content-type-options: nosniff
x-response-time: 89
x-transaction-id: 80f2956a1b502466
x-tw-cdn: VZ
x-tw-cdn: VZ
x-tw-cdn: VZ
content-length: 82663

Oddly, I often don't see the progress line for Requesting headers for these.

But sometimes I do:

586/3951 ./parser-output/media/1538286376952602624-S7LCF7ipDdhzjLkL.mp4: Requesting headers for https://video.twimg.com/ext_tw_video/1538091948464013313/pu/vid/720x540/S7LCF7ipDdhzjL

followed by:

586/3951 ./parser-output/media/1538286376952602624-S7LCF7ipDdhzjLkL.mp4: FAIL. Media couldn't be retrieved from https://video.twimg.com/ext_tw_video/1538091948464013313/pu/vid/720x540/S7LCF7ipDdhzjLkL.mp4?tag=12 because of exception: 'content-length'

I'm seeing this exact same issue.
Any ideas?