cyanfish / naps2

Scan documents to PDF and more, as simply as possible.

Home Page:https://www.naps2.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Limit number of concurrently running tesseract processes to prevent OOM situations

knochenhans opened this issue · comments

Is your feature request related to a problem? Please describe.
It looks like NAPS tries to run tesseract on all pages simultaneously, leading to high memory usage and (as in my case OCRing ~350 pages) to OOM errors.

Describe the solution you'd like
Either implement an option to limit the number of concurrently running tesseract processes or make spawning processes memory-aware.

Describe alternatives you've considered
For now, I just saved partial batches of the complete batch.

Additional context
I tested under Arch Linux 7.2.2.0 with about 16 GB of free RAM at the time of trying to OCR and save a batch with about 350 pages.

How many cores does your CPU have? Currently it's based on that.

Also to check, the memory is definitely used by the Tesseract processes and not the NAPS2 process?

Ah, that could explain it. This machine uses a AMD Ryzen 7 5825U with 16 cores, though I saw many more processes being spawned when monitoring memory usage with a task monitor. I guess it’s actually spawning like 2 or 3 processes per core, each with its own memory footprint.

Yeah, it would be 1 per thread = 32. It should be possible to cap that somehow.

Allegedly setting the cpu affinity for the NAPS2 process will affect the reported number of cores, so a workaround might be to set affinity to cores 0-7 to limit the number of processes to 8.

I've improved this in NAPS2 7.3.0. Please let me know if you still have any problems with it.