andyjsmith / SmugMug-Downloader

Download all the images from a SmugMug user

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CPU bound when files are already download

obadz opened this issue · comments

commented

First of, thank you for this tool. It's amazing that with ~100 lines of code you put together a tool that gets the job done, with no faff, and a nice UI to boot!

The only thing I find odd is that if you restart it against a partially downloaded collection, it uses 100% CPU for a long time until it gets back to the point where it left off. At first I thought it was checking hashes but reading the code I can see that's not the case.

I'm thinking maybe get_json is slow? Since that's pretty much the only thing that happens in the inner loop.

I think the issue you're having is that there isn't really any proper resuming functionality. The script just starts downloading from the beginning and if the image already exists on disk it just skips it:

SmugMug-Downloader/smdl.py

Lines 101 to 103 in fc445e7

# Skip if image has already been saved
if os.path.isfile(image_path):
continue

A proper implementation might read the files on disk or save a progress file to determine where to start off from, but right now it will just loop through API requests and file checks until it gets to an image that isn't on disk yet. You're welcome to submit a PR if you have an improvement but I don't have the time to work on this change right now.