hhzrd / chinese_newword_discovery

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CHINESE NEW WORD DISCOVERY (Open-source-version)

Based on Branch Entropy and Mutual Information

This method uses Mutual Information(MI) to find possible words.

Left and Right Branch Entropy to check word boundaries.

This is a open-source-version, you can also improve the speed of calculation by optimizing the data structure.

Python version and something may you need to install:

python 3.6

tqdm

numpy

About


Languages

Language:Python 100.0%