nathell / soupscraper

dej, mam umierajoncom zupe

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to process what's already downloaded

schlingel opened this issue · comments

I have let it run for multiple days now. It has downloaded all pages and a bunch of assets (around 2500 from 4528).

But I guess the soup service is shutdown. I only get 503s and not even the logo is on the assets URL anymore.

Is there a way of "materializing" what's already in the cache without the need to finish the download of all assets?

1up from me. It appears the site is closed for good. Is there a way to finalize the process without downloading all assets?

Cheers!

@Pumpkineer Hey, I just tried the new IPs for the soup servers and it seems to get me some additional assets. Try it too, maybe you can get a few more assets (if not all) out of it! @nathell did update the readme with the IPs and how to update the hosts file.

Yeah, the /etc/hosts workaround should work for now. I'll leave this issue open, though, because I do want to make it possible to finalize the process. Might take a few days though.

Mine just gave up on the last file, saying: Received fatal alert: handshake_failure.
I guess I'm that lucky that I made my copy in time? :P

obraz

@nathell That would be great. I'm still missing 300 assets and since yesterday only one new one could be downloaded.

@dragon99919 @schlingel If you're still facing this, try to look at the end of log/skyscraper.log and see which URLs it's trying (and failing) to download. I have received report that you might have to add extra domains to the hosts file; specifically

45.153.143.248 0.asset.soup.io

but maybe also others (depending which URLs it's having trouble with).

Worked like a charm, thanks!
Maybe adding this to readme would help prevent future issues with it?

@nathell Any news on finalizing the process?