kimbauters / ZIMply

An easy to use offline reader for ZIM files right in your browser!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Takes too much RAM

aweasadbek opened this issue · comments

My PC specs:
Intel i5-2320
SSD 120 GB (Running OS)
HDD 1TB (wikipedia dump is here)
RAM 4GB
Swap 2 GB
Ubuntu 20.04.1 LTS

I'm running full English Wikipedia with Pictures. After the start of the server I can browse several articles but then python3 process takes more than 4 GB of RAM and computer freezes for forever, so I have to unplug. Is this how it should behave or I'm doing something wrong?

I have both .py file and dump in the same folder. Size of dump file is 92 GB.
The .py file contains the following:

from zimply import ZIMServer
ZIMServer("$PATH/wikipedia_en_all_maxi_2020-06.zim")

There are no known memory leaks in ZIMply. That being said, packages update all the time so it could very well be an issue with one of the underlying packages, or a problem in the code that now emerged because of a package change. In my own testing there is nothing that is out of the ordinary.

For the time being it doesn't look like I can help you out on this one. As always, do keep your Python and the packages up-to-date. If an update of any of the packages resolved this problem then please do let us know.

I'm not closing right now. If others are experiencing the same issue then it is something worth looking into further.

I have updated pip3.
Updated all packages using the following code:
pip3 list --outdated --format=freeze | grep -v '^\-e' | cut -d = -f 1 | xargs -n1 pip install -U
Reinstalled zimply.
But the problem is not fixed. I can load small pages without any trouble (python3 using 20-90 MB of RAM) but pages with a lot of pictures loads very slow and during the load the python3 increases RAM usage from 20-90 MB to 4 GB.
How much RAM does python3 takes in your computer while loading "Elon Musk" page?

All those updates look fine.

I'm not using the same ZIM as you but instead the English wikivoyage (much smaller download size). I'm using it with Python 3.8, falcon 2.0.0, gevent 20.6.2, and mako 1.1.3 on a macOS 10.15.5 system. Memory use from Python never goes over 40MB.

Pretty much anything in a ZIM file is encoded in the same way, whether it is a webpage or an image. Since you can load small packages it might be that there is something wrong with the ZIM file itself. Some errors in encoding can cause the memory use to blow up for no good reason. This is not related to ZIMply but to how image compression works. It's also not the first time that there are problems with incorrectly encoded ZIM files so there are precedents here.

One thing in ZIMply you can try is to open the zimply.py file. At line 239 there is a line @lru_cache(maxsize=32). Simply delete this line and then restart ZIMply. The purpose of this line is to improve performance by caching earlier results. If there is indeed a problem with image decompression then the same image that is considerably too big would be stored in a cache for later retrieval, along with others that may have the same problem. There's no guarantee this will fix your issue – but it is worth a try.

I have commented line 239 in ~/.local/lib/python3.8/site-packages/zimply/zimply.py but problem is not fixed.

The process freezes during load of an image:
::ffff:127.0.0.1 - - [2020-08-06 14:45:11] "GET /I/m/Ulu%E1%B9%9Fu_(Ayers_Rock)%2C_Sunset.jpg HTTP/1.1" 200 6264 0.115406
While load I can see full text of the page and the first images. The images at the bottom does not show up. Browser continues to load while python3 is loading one image.
When I try to load that image separately I can do that easily without any problems. All images can be accessed separately.

And the browser extension of Kiwix works perfectly with my zim file.
I'm using this dump http://download.kiwix.org/zim/wikipedia/wikipedia_en_100_2020-06.zim.

Log file:

INFO: A ZIM file in the language en (ISO639-1) was found, containing 20029280 articles.
INFO: The index file is determined to be located at /home/aweasadbek/ASADBEK_AWE/CurrentLearnings/index.idx.
INFO: accessing the article: User:The other Kiwix guy/Landing
INFO: accessing the article: Music

If the process freezes it does indicate that there is something wrong with the ZIM file as that line has no functional effect other than optimising performance. You could try and set the value low, e.g. lru_cache(maxsize=2).

Kiwix are the ones who create these ZIM files, so they have a far better understanding of where things can go wrong. It is not uncommon for a ZIM file to load with Kiwix but not with ZIMply. ZIMply strictly adheres to the ZIM standards and does not include any code to correct for possible issues with the ZIM file.

Used zimtools to zimcheck my dump, results:

[INFO] Checking zim file ../wikipedia_en_all_maxi_2020-06.zim
[INFO] Verifying Internal Checksum..
[ERROR] Wrong Checksum in ZIM file
[ERROR] Invalid checksum :
ZIM File Checksum in file: f1e101af77b6a67c3debe463b6056ae1

[INFO] Overall Test Status: Fail
[INFO] Total time taken by zimcheck: 1815 seconds.

Thanks for the update. I was unaware of zimcheck. That does seem to confirm that the problem is with the ZIM file and with with ZIMply. I'm open to others contributing changes to make ZIMply more robust to these kind of errors but it is not something I will be pursuing in the near future.

Thanks for at least creating it