chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PDF Text extraction: Date superscript split into separate lines

teohsinyee opened this issue · comments

My PDF original text screenshot:
image

Result of extraction:
image

Is there any setting to extract the exact line as 2nd of March 2015 onwards rather than splitting it into 3 lines?
Very much appreciated!

I think you can do something in the upstream Tika server library. Please ask on dev@tika.apache.org. cc @tballison thanks @teohsinyee