Thread crash when parsing some special file
sherllochen opened this issue · comments
Sherllo Chen commented
This file has name with .doc, but actually is html file. When processing it, yomu will running for a very long time without and response, until I force to kill the thread.
Even if I change the filename to *.html, it still the same, so maybe the file is special.
And then I try to parse with tika directly, it extract text rightly.
Andrew Bromwich commented
@sherllochen I don't believe this project is maintained. Suggest try using the newer version of Tika (v1.14). I've forked this project and updated Tika. See https://github.com/abrom/henkei