ikawaha / kagome

Self-contained Japanese Morphological Analyzer written in pure Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Too much memory allocation in case of Shift_JIS input

syou6162 opened this issue · comments

I'm using kagome to parse the text parts of html files (very useful, thanks!). Some files contains Shift_JIS characters, and kagome does not return the output for these files (consumes much memory).

Sample input:

echo "日本語" | nkf -s | kagome

Thank you for the report !

I will consider fixing it to stop even if given an illegal (non-utf8) string.

I found you have already made a fixing branch, thank you for fixing!

To reproduce the issue, I used the command line mode in the above example. But I usually kagome as a golang library. So it is appreciated for me if you fix the library part 🙇

This is fixed in #102.
Please try it, and let me know if you have any problems.

Thank you for the report and using kagome !

I updated kagome, and confirmed that #102 resolved the memory issue. Thanks! I'll close this issue.

I'm using kagome to make a annotation tool by active learning (kagome is used to extract features from html content). kagome is very useful because each annotator does not need to download dictionary files or parameter ones separately.