Any way to detect formatting?
enrac5 opened this issue · comments
enrac5 commented
Discussed in #1106
Originally posted by enrac5 March 8, 2024
Hi there, I am parsing a PDF with tables and I'd like to be able to detect formatting like italics and bold in the text. Any ideas on if that's possible (or any hacks anyone has) and how to do it?
Edit: I have this code snippet that works for characters:
`
import pdfplumber
pdf_path = "/tmp/Foo_1.pdf"
pdf = pdfplumber.open(pdf_path)
page = pdf.pages[0]
line_list = []
for char in page.chars:
print(char["fontname"])
`
Which is great, but how do I do this for a given table?
Jeremy Singer-Vine commented
Let's keep this in one thread; closing in favor of #1106