How to structure compressed files, such as rar, zip format?
NLPOR opened this issue · comments
xiaxy commented
I found that when parsing compressed files, the content of each file in the subdirectory is mixed in the content field.
eg. test.zip => test/a.txt test/b.txt, after
parsed = parser.from_file('test.zip') parsed['content']=..... parsed['metadata']=......
How can I structure the file ‘content‘ and ‘metadata‘ in the subdirectory according to the file name of the subdirectory?
Chris Mattmann commented
This is an issue in the upstream Tika server library response please ask this question on dev@tika.apache.org.