Stop iterating on the content that is 404'd or DMCA'd
fl0werpowers opened this issue · comments
Some content that is present in the archives either does not exist anymore due to it being deleted by the original uploader, or it is taken down via DMCA claims. The tool clearly emits the exceptions (as 'Download failed with status "404 Not Found"' and 'Download failed with status "403 Forbidden"' respectively), with the 403 one clearly specifying that the content in question has been struck by DMCA. Iterating through such content multiple times is a waste of time, and such media can be skipped to save time.
these are the exceptions in question
FAIL. Media couldn't be retrieved from https://pbs.twimg.com/media/EbH_bxcUYAgxbki.png:orig because of exception: Download failed with status "404 Not Found". Response content: ""
FAIL. Media couldn't be retrieved from https://video.twimg.com/ext_tw_video/1560406436982804480/pu/vid/1280x720/m7-vUTLunERc4auB.mp4?tag=12 because of exception: Download failed with status "403 Forbidden". Response content: "{"error_code":2,"error_response":"Dmcaed"}"
Agree. Was going to raise this issue myself. Thanks for the well written issue.