wang9702/BERT-NER-Pytorch

Chinese NER using Bert

BERT for Chinese NER.

Input format (prefer BIOS tag scheme), with each character its label for one line. Sentences are splited with a null line.

美	B-LOC
国	I-LOC
的	O
华	B-PER
莱	I-PER
士	I-PER

我	O
跟	O
他	O

note: file structure of the model

├── prev_trained_model
|  └── bert_base
|  |  └── pytorch_model.bin
|  |  └── config.json
|  |  └── vocab.txt
|  |  └── ......

The overall performance of BERT on dev:

The overall performance of ALBERT on dev:

model	version	Accuracy(entity)	Recall(entity)	F1(entity)	Train time/epoch
albert	base_google	0.8014	0.6908	0.7420	0.75x
albert	large_google	0.8024	0.7520	0.7763	2.1x
albert	xlarge_google	0.8286	0.7773	0.8021	6.7x
bert	google	0.8118	0.8031	0.8074	-----
albert	base_bright	0.8068	0.7529	0.7789	0.75x
albert	large_bright	0.8152	0.7480	0.7802	2.2x
albert	xlarge_bright	0.8222	0.7692	0.7948	7.3x

The overall performance of BERT on dev(test):

MIT License

Language:Python 99.8%Language:Shell 0.2%