Alibaba-NLP / DAAT-CWS

Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DAAT-CWS

Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation

Paper accepted by ACL 2020

Prerequisites

  • python == 2.7
  • tensorflow == 1.8.0

Dataset

source domain dataset PKU and five distantly-annotated target datasets are put in data/datasets directory

Usage

Run python train.py --tgt_train_path <tgt_train_path> --tgt_test_path <tgt_test_path>

Note:

This code is based on the previous work by chqiwang. Many thanks to chqiwang. The raw text of dataset used in our paper can be found at CWS-NAACL2019

About

Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation

License:Apache License 2.0


Languages

Language:Python 90.8%Language:Perl 9.2%