yihong-chen / chinese-word-segmentation

Simple chinese word segmentation with experiments on the PKU datatset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

chinese-word-segmentation

Simple chinese word segmentation with experiments on the PKU datatset

Methods

  • Pattern based word segmentation
  • CRF ++ tagging
  • LSTM tagging

Performance

F1

  • Pattern Based Segmentation: 0.87
  • CRF++ Tagging: 0.93
  • LSTM Tagging: 0.86

It seems that the simple LSTM tagger doesn't perform better than CRF++ or even pattern based segmentation.

Tips for improve the performance of the LSTM tagger on the segementation task

About

Simple chinese word segmentation with experiments on the PKU datatset


Languages

Language:Jupyter Notebook 86.6%Language:Python 12.2%Language:Perl 1.2%