chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to access Apache Tika's recursiveJSON object using python-tika?

NLPOR opened this issue · comments

commented

I'm using Apache Tika to OCR a bunch of PDFs. When I use the GUI (by doing java -jar tika-app-1.22.jar) everything works fine: I go to "Recursive JSON" on the "View" menu and the text is all there (even though nothing appears on "Main Content"). But when I use the Python wrapper I don't see any option to extract any "Recursive JSON" objects; and print(parsed['content']) returns an empty string. (Though print(parsed['metadata']) returns the metadata correctly. But I need the content.) What am I missing?

Without the file you were testing I can't really comment on this? Seems like the error stems upstream from the Tika library though and I recommend asking this one on dev@tika.apache.org. @tballison