Diego999 / py-rouge

Full Python implementation of the ROUGE metric, producing same results as in the official perl implementation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it support Chinese?

pencoa opened this issue · comments

Is it support Chinese?
When I evaluate rouge of Chinese summarization, I need to segment sentences to words. No nltk stemmer needed.
e.g.

我来自火星。 --> 我 来自 火星 。

There exist several Chinese segmentor like jieba.
We can use it to preprocess sentence.

I have one solution. Create a temporary dict each time and convert chinese word to string number.

Input

hypothesis = '乌龟状态不够稳定'
references = '服务器不稳定'

hyp = list(hypothesis)
ref = list(references)
tol = hyp + ref
tol = list(set(tol))

w2id = {w : str(idx) for idx, w in enumerate(tol)}
hypid = list(map(lambda x: w2id[x], hyp))
refid = list(map(lambda x: w2id[x], ref))

hyped = ' '.join(hypid)
refed = ' '.join(refid)
print(hyped)
print(refed)

Output

10 7 1 3 8 4 2 0
9 5 6 8 2 0

For processing Chinese texts. https://github.com/JialeGuo/py_rouge_zh