aarasteh / CWS

Source code for an ACL2016 paper of Chinese word segmentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#CWS

This code implements the word segmentation algorithm proposed in the following paper.

Deng Cai and Hai Zhao, Neural Word Segmentation Learing for Chinese. ACL 2016.

##Update! a faster implementation using dynet as backend is now available. ##python train.py -d to use the new (dynet based) version.

##Usage (old, also helpful to new version): ###- train python train.py -t. To train a model, first check the hyperparameter settings in train.py. The training procedure will result a config file at the very beginning in which your hyperparameter settings are preserved, and output the trained model parameters to *.npz per epoch.

###- test python test.py params.npz input_file output_path config_file. To test a trained model, specify the file that stores the model parameters as params.npz as well as the corresponding configuration file config_file. The test procedure will read data from input_file and output result to output_path.

###- evaluate
For example, To see the best result (F1-score 95.5) on PKU dataset reported in our paper, first generate the output file through the trained model ( python test.py best_pku.npz ../data/pku_test somepath best_pku_config), then use the command ./score ../data/dic ../data/pku_test somepath.

##Dependencies: Thanks to those excellent computing tools: Dynet, Theano, Numpy, Gensim.

##Author: Deng Cai. Any question, feel free to contact me through my email.

##Citation: If you find this code useful, please cite our paper.

@InProceedings{cai-zhao:2016:P16-1,
  author    = {Cai, Deng  and  Zhao, Hai},
  title     = {Neural Word Segmentation Learning for Chinese},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month     = {August},
  year      = {2016},
  address   = {Berlin, Germany},
  publisher = {Association for Computational Linguistics},
  pages     = {409--420},
  url       = {http://www.aclweb.org/anthology/P16-1039}
}

About

Source code for an ACL2016 paper of Chinese word segmentation


Languages

Language:Python 85.3%Language:Perl 14.4%Language:Shell 0.3%