Exceen / 4chan-downloader

Python3 script to continuously download all images/webms of multiple 4chan thread simultaneously - without installation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add a control for maximum download speed

ZinRicky opened this issue · comments

At the moment, the script monopolises the bandwith. It would be nice to have an optional input argument to limit how much connection is used.

I'm working on an argument to disable multiprocessing (downloading one thread at a time) and sleeping in between every image downloaded.
Both of these options are really slow and inefficient. The best way to rate limit the downloads would be to stream the images/videos, which is not possible with the version of urllib that the program currently uses. So I'll make a branch using urllib3 or requests and that's up for @Exceen to decide if he wants to add it to the program or not, because that could bring big changes to the codebase.

I don't have a problem with using urllib3 or requests but I'm not sure if it makes sense to disable multiprocessing. As you say yourself it's heavily inefficient. I think I even missed out on some threads when this script was still single-threaded years ago because it was downloading everything too slow. Is there any reason to make it single-threaded besides limiting the bandwidth?
What about having one process per (4chan-)thread which just checks for new images and instead of downloading it just puts it on a queue. And then have a separate process running which just processes the queue image after image. Maybe that way it would be more efficient than your current approach while still keeping the possibility open to limit the bandwidth? You would go a bit over the specified bandwidth limit because the page loads are not within this scope but that shouldn't be a problem I guess.

I like you idea a lot more. My plan is to do that and prioritize downloading threads that are about to 404 or archive. Also the part about needing urllib3 or requests was my mistake. It's possible to do it without those libraries I just forgot.