chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Checkboxes convert to FORMCHECKBOX

claire-herdeman opened this issue · comments

Checkboxes from Word documents convert to the text "FORMCHECKBOX" and lose any info about whether or not they are checked. Is it possible to render those differently and ideally maintain the "checked" status?

For instance, a row of checkboxes in a document converted as such (using the xmlContent=True flag):

<p><b>Current Permanency Plan</b>:   FORMCHECKBOX 
 concurrent plan      FORMCHECKBOX 
 reunification      FORMCHECKBOX 
 adoption      FORMCHECKBOX 
 emancipation/transition     FORMCHECKBOX 
 guardianship     </p>

This is likely due to the underling Tika and Tika server libraries. Please ask on dev@tika.apache.org. @tballison