hrbrmstr / pdfbox

📄◻️ Create, Maniuplate and Extract Data from PDF Files (R Apache PDFBox wrapper)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Function to extract bold or italicised text

sanjmeh opened this issue · comments

I was redirected to this package from the SO question https://stackoverflow.com/q/53398611/1972786.
I see only 4 functions in pdfbox R package 0.2.0

extract_text, extract_uris, image_count, pdf_info

I tried all 4, None of these can be used for extracting bold or italicised words from the pdf doc.

Please can you throw some light on this and also is there any hidden way to get the meta data of the pdf text extracted from the pdf files?

It + the dependent pkg provides the JARs which can let R use the Java functions. As noted in the "answer" on the site, thinking out of the box is likely a better option.