Is it support Chinese?

Question

Is it support Chinese?

pencoa opened this issue 5 years ago · comments

Is it support Chinese?
When I evaluate rouge of Chinese summarization, I need to segment sentences to words. No nltk stemmer needed.
e.g.

我来自火星。 --> 我来自火星。

Randy Pen · Answer 1 · Thu Jun 27 2019 15:07:35 GMT+0800 (China Standard Time)

There exist several Chinese segmentor like jieba.
We can use it to preprocess sentence.

Randy Pen · Answer 2 · Thu Jun 27 2019 16:12:56 GMT+0800 (China Standard Time)

I have one solution. Create a temporary dict each time and convert chinese word to string number.

Input

hypothesis = '乌龟状态不够稳定'
references = '服务器不稳定'

hyp = list(hypothesis)
ref = list(references)
tol = hyp + ref
tol = list(set(tol))

w2id = {w : str(idx) for idx, w in enumerate(tol)}
hypid = list(map(lambda x: w2id[x], hyp))
refid = list(map(lambda x: w2id[x], ref))

hyped = ' '.join(hypid)
refed = ' '.join(refid)
print(hyped)
print(refed)

Output

10 7 1 3 8 4 2 0
9 5 6 8 2 0

Jiale Guo · Answer 3 · Wed Nov 20 2019 20:50:18 GMT+0800 (China Standard Time)

For processing Chinese texts. https://github.com/JialeGuo/py_rouge_zh