Performace issue (eating RAM)

Question

Performace issue (eating RAM)

mbnoimi opened this issue a month ago · comments

Muhammad Bashir Al-Noimi commented a month ago

Hi,

I'm downloading a website with 3 depth in the same domain. My laptop RAM is 16 GB
Withing less than 3 hours, the extension ate my RAM to 90% Which forced me to force restart my laptop.
This issue occurs with big websites only (my website size about 1.5 GB mostly pure html)

Is there any workaround for enhancing the performance?

Linux Mint 21.3 Xfce
Firefox 126.0 (64-bit)
Save captured data to: Scrapbook folder
Save captured data as: Folder

Danny Lin · Answer 1 · Mon May 27 2024 20:40:25 GMT+0800 (China Standard Time)

There's probably not too much you can do besides upgrading the hardware. It may be more performant by saving to the backend server in some cases, though.

Muhammad Bashir Al-Noimi · Answer 2 · Mon May 27 2024 20:43:50 GMT+0800 (China Standard Time)

There's probably not too much you can do besides upgrading the hardware. It may be more performant by saving to the backend server in some cases, though.

I use WebHTTrack it works pretty fine but for some reason my cookies doesn't work fine. For that I use webscrapbook because it deals with cookies behind the scenes.

Muhammad Bashir Al-Noimi · Answer 3 · Mon May 27 2024 21:03:15 GMT+0800 (China Standard Time)

There's probably not too much you can do besides upgrading the hardware

BTW, Why webscrapbook stores all the scrapped data in the memory then save them in the last step? Why it doesn't save them one by one just like wget and httrack?

Danny Lin · Answer 4 · Mon May 27 2024 21:25:17 GMT+0800 (China Standard Time)

BTW, Why webscrapbook stores all the scrapped data in the memory then save them in the last step? Why it doesn't save them one by one just like wget and httrack?

This is not true. Intermediate data is mostly saved to the browser storage, which is ultimately in the disk in some form.

The browser extension API is so limited that it cannot load files that are downloaded to the local filesystem. When capturing multiple web pages, the saved pages need to be loaded and have all links to other downloaded pages rewritten, which is not possible before all pages have been downloaded. As a result, we have to save all downloaded pages in the browser storage, rewrite them, and then save to the local filesystem.