hankcs / ID-CNN-CWS

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ID-CNN-CWS

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation" published in NNW journal.

2017-10-20_13-23-31

It implements the following 4 models for CWS:

  • Bi-LSTM
  • Bi-LSTM-CRF
  • ID-CNN
  • ID-CNN-CRF

Dependencies

  • Python >= 3.6
  • TensorFlow >= 1.2

Both CPU and GPU are supported. GPU training is 10 times faster.

Preparation

Run following script to convert corpus to TensorFlow dataset.

$ ./scripts/make.sh

Train and Test

Quick Start

$ ./scripts/run.sh $dataset $model
  • $dataset can be pku, msr, asSC or cityuSC.
  • $model can be cnn or bilstm.

For example:

$ ./scripts/run.sh pku cnn

It will train a cnn model on pku dataset, then evaluate performance on test set.

CRF Layer

To enable CRF layer, simply append --viterbi to your command, e.g.

$ ./scripts/run.sh pku cnn --viterbi

Accuracy

2017-10-20_13-25-11

Speed

2017-10-20_11-44-42

Acknowledgments

About

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"

License:GNU General Public License v3.0


Languages

Language:Python 87.8%Language:Shell 7.3%Language:Perl 5.0%