How to process what's already downloaded

Question

How to process what's already downloaded

schlingel opened this issue 4 years ago · comments

schlingel commented 4 years ago

I have let it run for multiple days now. It has downloaded all pages and a bunch of assets (around 2500 from 4528).

But I guess the soup service is shutdown. I only get 503s and not even the logo is on the assets URL anymore.

Is there a way of "materializing" what's already in the cache without the need to finish the download of all assets?

Pumpkineer · Answer 1 · Tue Jul 21 2020 15:44:49 GMT+0800 (China Standard Time)

1up from me. It appears the site is closed for good. Is there a way to finalize the process without downloading all assets?

Cheers!

schlingel · Answer 2 · Tue Jul 21 2020 18:32:16 GMT+0800 (China Standard Time)

@Pumpkineer Hey, I just tried the new IPs for the soup servers and it seems to get me some additional assets. Try it too, maybe you can get a few more assets (if not all) out of it! @nathell did update the readme with the IPs and how to update the hosts file.

Daniel Janus · Answer 3 · Tue Jul 21 2020 18:45:36 GMT+0800 (China Standard Time)

Yeah, the /etc/hosts workaround should work for now. I'll leave this issue open, though, because I do want to make it possible to finalize the process. Might take a few days though.

dragon99919 · Answer 4 · Wed Jul 22 2020 16:52:46 GMT+0800 (China Standard Time)

Mine just gave up on the last file, saying: Received fatal alert: handshake_failure.
I guess I'm that lucky that I made my copy in time? :P

schlingel · Answer 5 · Wed Jul 22 2020 17:37:42 GMT+0800 (China Standard Time)

@nathell That would be great. I'm still missing 300 assets and since yesterday only one new one could be downloaded.

Daniel Janus · Answer 6 · Wed Jul 22 2020 17:46:22 GMT+0800 (China Standard Time)

@dragon99919 @schlingel If you're still facing this, try to look at the end of log/skyscraper.log and see which URLs it's trying (and failing) to download. I have received report that you might have to add extra domains to the hosts file; specifically

45.153.143.248 0.asset.soup.io

but maybe also others (depending which URLs it's having trouble with).

dragon99919 · Answer 7 · Wed Jul 22 2020 19:11:36 GMT+0800 (China Standard Time)

Worked like a charm, thanks!
Maybe adding this to readme would help prevent future issues with it?

Martin Keiblinger · Answer 8 · Wed Aug 19 2020 20:30:49 GMT+0800 (China Standard Time)

@nathell Any news on finalizing the process?