chbrown / overdrive

Bash script to download mp3s from the OverDrive audiobook service

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Partial downloads aren't finished when rerunning

grr opened this issue · comments

commented

Program was interrupted during a download and upon rerunning, the last partially downloaded file was skipped because it already existed, even though it was not completely downloaded. Instead of just checking if the file exists, you need to also check the byte count. Probably could just let curl handle that by passing it --continue-at -.

i ran into this. couldn't figure out what to change in the code, so i wound up downloading all the parts again.

Yeah, sorry, this comes up so rarely that I haven't given it much attention.

That's pretty much what I do, but for the record, you only need to delete the (last) incomplete file. But also totally fine to delete them all and rerun, like you mentioned.

commented

Passing --continue-at - to curl should allow it to resume downloading partial downloads without any needed manual intervention (except needing to run the program again). And adding --retry 3 would reduce the need for that a bit as well. The more automation, the better.

Right, except that I do an existence check at https://github.com/chbrown/overdrive/blob/2.3.1/overdrive.sh#L299 in order to avoid hitting the server at all if the file was already downloaded. Otherwise, curl has to send at least a HEAD request in order to determine whether the local file is a complete representation of the given URL.

Since the Overdrive server responds differently after the license expires, this means (potentially accidentally) re-running overdrive download ... could corrupt local files that were actually totally fine.

I think the right solution would be to try to detect when curl exits prematurely (before retrieving the complete file), delete the incomplete file, and print a message asking the user to try again. And advise them to check the logs/inputs first, because I don't want to thrash against the Overdrive server with a request that's never going to succeed, possibly triggering some red flag within Overdrive's monitoring, resulting in banning the user's IP or something. I have no idea what's going on behind the scenes at Overdrive, but I do know that with other APIs I've worked with, it pays to be gentle and fly under the radar.

Something like the --retry 3 option you suggested is a good idea; it's probably low-risk, even considering the previous paragraph, but (as I mentioned in a previous comment) I run into this so rarely that I don't know what conditions typically lead to ending with partial downloads, and whether simply retrying would resolve them.

commented

I think the right solution would be to try to detect when curl exits prematurely (before retrieving the complete file),

definitely the better solution

I run into this so rarely that I don't know what conditions typically lead to ending with partial downloads, and whether simply retrying would resolve them.

this happens to me frequently because i have spotty wifi and it will often drop connections

Ok, as of version 2.3.2 I've added that --retry 3 flag to all curl calls.

I "tested" it by briefly flicking my wifi off and back on while downloading a book, and ended up with the same output as without the transitory outage. Probably won't work if the outage is longer than several seconds, but better than nothing.

Closing this because I don't relish the idea of sinking the requisite time/effort to implement the "right solution" into a project that's liable to go entirely obsolete any day now (due to Libby)... but if you fiddle around with the options and come up with a good combination or some other solution that addresses your connectivity issues while avoiding the file-clobbering concern I mentioned above, please submit a PR!