Parse Image on the fly
FrsECM opened this issue · comments
Feature request
Hello !
I'm using pdfminer to parsefiles that will be use in a RAG Pipeline.
Context
In order to do that, i would like to do two things :
- Image captioning on the fly
- Table Parsing on the fly
Currently, theses two components are hard to parse on the fly.
Images
In order to process images, i would like to create PIL image on the fly from LTImage.
Currently, the ImageWriter class takes a folder as input.
=> It would be great that there is another class ImageWriterPIL, that generate the PIL Image.
The ImageWriter would just override this behaviour to save the PIL image. Not more.
Tables
Sometimes, it's challenging to detect and parse tables. The fix is not as easy as the one for images.
Especially when there is multiple table on the same page.
For example this file :
Mathematical Foundations of Image Processing and Analysis 2 - 2014 - Pinoli - Table of Acronyms.pdf
We have multiple tables on the same page, and the result is a lot of LTRect/LTTextBoxes that are complicate to understand.
It would be great if we can (optinally) have a layout component "LTTable" to handle that.
We can do this by :
- Rendering the page as an low_res image.
- Use an AI model to detect tables, for example : https://huggingface.co/microsoft/table-transformer-detection
- Add object inside the bounding box in a LTTable container that would allow to parse them more properly.
All of this can be done in the library or not, but at least we need :
- Render a page as a PIL image.
It's a feature that is available in pdfreader library. But i prefer pdfminer for a lot of other things.