chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to structure compressed files, such as rar, zip format?

NLPOR opened this issue · comments

commented

I found that when parsing compressed files, the content of each file in the subdirectory is mixed in the content field.
eg. test.zip => test/a.txt test/b.txt, after
parsed = parser.from_file('test.zip') parsed['content']=..... parsed['metadata']=......
How can I structure the file ‘content‘ and ‘metadata‘ in the subdirectory according to the file name of the subdirectory?

This is an issue in the upstream Tika server library response please ask this question on dev@tika.apache.org.