jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Any way to detect formatting?

enrac5 opened this issue · comments

Discussed in #1106

Originally posted by enrac5 March 8, 2024
Hi there, I am parsing a PDF with tables and I'd like to be able to detect formatting like italics and bold in the text. Any ideas on if that's possible (or any hacks anyone has) and how to do it?

Edit: I have this code snippet that works for characters:

`
import pdfplumber
pdf_path = "/tmp/Foo_1.pdf"

pdf = pdfplumber.open(pdf_path)
page = pdf.pages[0]

line_list = []
for char in page.chars:
print(char["fontname"])
`

Which is great, but how do I do this for a given table?

Let's keep this in one thread; closing in favor of #1106