ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

My computer crashed. I'm 10gb into a crawl. How can I "resume" this crawl?

komali2 opened this issue · comments

Partially through crawling a website, my computer crashed. I'd like to resume this crawl without re-downloading 10gb of information. How can I do so?

My command was

grab-site https://hk.appledaily.com/ --dir=/media/caleb/'bighdd'/appledaily3 --finished-warc-dir=/media/caleb/'bighdd'/appledaily

If I try to run the same command again, I get the error File exists: '/media/caleb/bighdd/appledaily3.

This is not currently supported: #58

Ok, thanks for letting me know, I will restart the archive!