Parse Image on the fly

Question

Parse Image on the fly

FrsECM opened this issue 4 months ago · comments

François Ponchon commented 4 months ago

Feature request
Hello !
I'm using pdfminer to parsefiles that will be use in a RAG Pipeline.

Context

In order to do that, i would like to do two things :

Image captioning on the fly
Table Parsing on the fly

Currently, theses two components are hard to parse on the fly.

Images

In order to process images, i would like to create PIL image on the fly from LTImage.
Currently, the ImageWriter class takes a folder as input.
=> It would be great that there is another class ImageWriterPIL, that generate the PIL Image.
The ImageWriter would just override this behaviour to save the PIL image. Not more.

Tables

Sometimes, it's challenging to detect and parse tables. The fix is not as easy as the one for images.
Especially when there is multiple table on the same page.

For example this file :
Mathematical Foundations of Image Processing and Analysis 2 - 2014 - Pinoli - Table of Acronyms.pdf

We have multiple tables on the same page, and the result is a lot of LTRect/LTTextBoxes that are complicate to understand.
It would be great if we can (optinally) have a layout component "LTTable" to handle that.
We can do this by :

Rendering the page as an low_res image.
Use an AI model to detect tables, for example : https://huggingface.co/microsoft/table-transformer-detection
Add object inside the bounding box in a LTTable container that would allow to parse them more properly.

All of this can be done in the library or not, but at least we need :

Render a page as a PIL image.

It's a feature that is available in pdfreader library. But i prefer pdfminer for a lot of other things.