J-F-Liu / lopdf

A Rust library for PDF document manipulation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question] How to get bounding box information for text on a page?

arifd opened this issue · comments

Hello, I'm not very familiar with the PDF spec, so perhaps this already has me at a disadvantage.

But I can't work out how to get at all the text and find their coordinates on a page.

I want to extract the positions for all of the text on page of a document basically.

I too would like to know if this achievable currently or otherwise support this as a feature request. Something similar to the API of PDF.js would be great which exposes text content, transform and width/height on its TextItem amongst some other fields

lopdf doesn't provide a method to do so but all the information you need is inside the pdf, although you will need another crate to parse the font and give you the font metrics, I might try to make an example later.