Any way to detect formatting?

Question

Any way to detect formatting?

enrac5 opened this issue 8 months ago · comments

Discussed in #1106

^{Originally posted by enrac5 March 8, 2024}
Hi there, I am parsing a PDF with tables and I'd like to be able to detect formatting like italics and bold in the text. Any ideas on if that's possible (or any hacks anyone has) and how to do it?

Edit: I have this code snippet that works for characters:

`
import pdfplumber
pdf_path = "/tmp/Foo_1.pdf"

pdf = pdfplumber.open(pdf_path)
page = pdf.pages[0]

line_list = []
for char in page.chars:
print(char["fontname"])
`

Which is great, but how do I do this for a given table?

Jeremy Singer-Vine · Answer 1 · Mon Mar 11 2024 09:31:24 GMT+0800 (China Standard Time)

Let's keep this in one thread; closing in favor of #1106