J-F-Liu / lopdf

A Rust library for PDF document manipulation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Streaming Open/Read as Opposed to One-Shot

naftulikay opened this issue · comments

I've been over the code a bit and I can't see if this is possible, so bear with me in case I missed something.

Does this library make it possible to open a document with a callback function to measure progress? I will be processing many large PDFs at the same time in order to extract text objects and images. Images will be further processed using OCR and some machine learning, and this will be operating in an async application. Blocking the current thread until the whole document has been parsed is not ideal, but in lieu of actual async, is there at least some way I can measure progress during the read step in Reader?

I'm not as familiar with the PDF format as I should be, but is there any sensible way of measuring progress? Even a callback for when bytes flow would be helpful for my use-case so I could display some progress. I will have to test things out on some of my very large files to see what loading times look like, I assume that loading is a bit faster than actually rendering.