Is it support Chinese?
pencoa opened this issue · comments
Randy Pen commented
Is it support Chinese?
When I evaluate rouge of Chinese summarization, I need to segment sentences to words. No nltk stemmer needed.
e.g.
我来自火星。 --> 我 来自 火星 。
Randy Pen commented
There exist several Chinese segmentor like jieba.
We can use it to preprocess sentence.
Randy Pen commented
I have one solution. Create a temporary dict each time and convert chinese word to string number.
Input
hypothesis = '乌龟状态不够稳定'
references = '服务器不稳定'
hyp = list(hypothesis)
ref = list(references)
tol = hyp + ref
tol = list(set(tol))
w2id = {w : str(idx) for idx, w in enumerate(tol)}
hypid = list(map(lambda x: w2id[x], hyp))
refid = list(map(lambda x: w2id[x], ref))
hyped = ' '.join(hypid)
refed = ' '.join(refid)
print(hyped)
print(refed)
Output
10 7 1 3 8 4 2 0
9 5 6 8 2 0
Jiale Guo commented
For processing Chinese texts. https://github.com/JialeGuo/py_rouge_zh