timhutton / twitter-archive-parser

Python code to parse a Twitter archive and output in various ways

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stop iterating on the content that is 404'd or DMCA'd

fl0werpowers opened this issue · comments

Some content that is present in the archives either does not exist anymore due to it being deleted by the original uploader, or it is taken down via DMCA claims. The tool clearly emits the exceptions (as 'Download failed with status "404 Not Found"' and 'Download failed with status "403 Forbidden"' respectively), with the 403 one clearly specifying that the content in question has been struck by DMCA. Iterating through such content multiple times is a waste of time, and such media can be skipped to save time.

commented

these are the exceptions in question

FAIL. Media couldn't be retrieved from https://pbs.twimg.com/media/EbH_bxcUYAgxbki.png:orig because of exception: Download failed with status "404 Not Found". Response content: ""

FAIL. Media couldn't be retrieved from https://video.twimg.com/ext_tw_video/1560406436982804480/pu/vid/1280x720/m7-vUTLunERc4auB.mp4?tag=12 because of exception: Download failed with status "403 Forbidden". Response content: "{"error_code":2,"error_response":"Dmcaed"}"

Agree. Was going to raise this issue myself. Thanks for the well written issue.