Keras solution of Chinese NER task using BiLSTM-CRF/BiGRU-CRF/IDCNN-CRF/single-CRF model with BERTs (Google's Pretrained Language Model: supporting BERT/RoBERTa/ALBERT).

Update Logs

2019.11.14 bert4keras is now used as a package as it doesnot change greatly. The albert model can only support Google's version now.

2019.11.04 Fix bugs for wrong result when calculating sentence accuracy and doing prediction.

2019.11.01 Replace crf_accuracy/crf_loss from keras-contrib with self-defined crf_accuracy/crf_loss to handle masks.

Future Work

Important: The padding item should be mask when calculating CRF Loss, or it will return a wrong loss result. (Crf-loss provided by keras-contrib do not solve this problem).

This project is currently under migration to tensorflow 2.0, which will take a few days if my work is not busy (lol).


This project can be installed via:

pip install keras_bert_ner

to uninstall:

pip uninstall keras_bert_ner


Data Format

        "O O B I O O O B I O O O O B I O O O O O O O B I O O O O O O B I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I O O O B I O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I O O O B I O O O O B I O O O O O O O B I O O O O O O B I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I O O O B I O O O O O O O O O O O O O O O O O O O O O O O O O"
        "O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I I I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I I I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I I I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O"

See in ./examples/data/train.txt, data source: 互联网金融新实体发现


Simply run python keras_bert_ner/train/ --help to see the relevant parameters for training a typical NER model. Where you will see as follows:

(nlp) liushaoweihua@ai-server-6:~/jupyterlab/Keras-Bert-Ner$ python keras_bert_ner/train/ --help
usage: [-h] -train_data TRAIN_DATA [-dev_data DEV_DATA]
               [-save_path SAVE_PATH] [-albert] -bert_config BERT_CONFIG
               -bert_checkpoint BERT_CHECKPOINT -bert_vocab BERT_VOCAB
               [-do_eval] [-device_map DEVICE_MAP] [-best_fit]
               [-max_epochs MAX_EPOCHS]
               [-early_stop_patience EARLY_STOP_PATIENCE]
               [-reduce_lr_patience REDUCE_LR_PATIENCE]
               [-reduce_lr_factor REDUCE_LR_FACTOR] [-hard_epochs HARD_EPOCHS]
               [-batch_size BATCH_SIZE] [-max_len MAX_LEN]
               [-learning_rate LEARNING_RATE] [-model_type MODEL_TYPE]
               [-cell_type CELL_TYPE] [-rnn_units RNN_UNITS]
               [-rnn_layers RNN_LAYERS] [-cnn_filters CNN_FILTERS]
               [-cnn_kernel_size CNN_KERNEL_SIZE] [-cnn_blocks CNN_BLOCKS]
               [-crf_only] [-dropout_rate DROPOUT_RATE]

More precisely:

Data File Paths:
  Config the train/dev/test file paths
  -train_data TRAIN_DATA                     (REQUIRED) Train data path
  -dev_data DEV_DATA                         (OPTIONAL) Dev data path. Needed when -do_eval=True

Model Output Paths:
  Config the output paths for model
  -save_path SAVE_PATH                       (OPTIONAL) Model output paths

BERT File paths:
  Config the path, checkpoint and filename of a pretrained or fine-tuned BERT model
  -albert                                    (OPTIONAL) Whether to use ALBERT model. Default is False
  -bert_config BERT_CONFIG                   (REQUIRED) bert_config.json
  -bert_checkpoint BERT_CHECKPOINT           (REQUIRED) bert_model.ckpt
  -bert_vocab BERT_VOCAB                     (REQUIRED) vocab.txt

Action Configs:
  Config the actions during running
  -do_eval                                   (OPTIONAL) Evaluation mode. Default is True
  -device_map DEVICE_MAP                     (OPTIONAL) Use CPU/GPU to train. If use CPU, then 'cpu'. 
                                             If use GPU, then assign the devices, such as '0'. Default 
                                             is 'cpu'

Train Configs:
  Config the train params
  -best_fit                                  (OPTIONAL) Train best model that suits for dev.txt. 
                                             Default is False
  -max_epochs MAX_EPOCHS                     (OPTIONAL) Training epochs. Only available when 
                                             -best_fit=True. Default is 256
  -early_stop_patience EARLY_STOP_PATIENCE   (OPTIONAL) Early stop patience. Only available when 
  																					 -best_fit=True. Default is 3
  -reduce_lr_patience REDUCE_LR_PATIENCE     (OPTIONAL) Reduce learning rate on plateau patience.
                        										 Only available when -best_fit=True. Default is 2
  -reduce_lr_factor REDUCE_LR_FACTOR         (OPTIONAL) Reduce learning rate on plateau factor.
                        										 Only available when -best_fit=True. Default is 0.5
  -hard_epochs HARD_EPOCHS                   (OPTIONAL) Training epochs. Only available when
                        										 -best_fit=False. Default is 10
  -batch_size BATCH_SIZE  									 (OPTIONAL) Batch size. Default is 64
  -max_len MAX_LEN      										 (OPTIONAL) Max sequence length. Default is 64
  -learning_rate LEARNING_RATE 							 (OPTIONAL) Initial adam lr. Default is 1e-5

Model Configs:
  Config the model params
  -model_type MODEL_TYPE                     (OPTIONAL) RNN models or CNN models. Default is rnn
  -cell_type CELL_TYPE                       (OPTIONAL) Cell types. If model_type='rnn', could be
                        										 bilstm or bigru. If model_type='cnn', could be idcnn.
                        										 Default is bilstm
  -rnn_units RNN_UNITS  										 (OPTIONAL) RNN units. Only available when model_type='rnn'. 
  																					 Default is 128
  -rnn_layers RNN_LAYERS										 (OPTIONAL) RNN layers. Only available when model_type='rnn'. 
  																					 Default is 1
  -cnn_filters CNN_FILTERS									 (OPTIONAL) CNN filters. Only available when model_type='cnn'. 
  																					 Default is 128
  -cnn_kernel_size CNN_KERNEL_SIZE					 (OPTIONAL) CNN filters. Only available when model_type='cnn'.
                        										 Default is 3
  -cnn_blocks CNN_BLOCKS										 (OPTIONAL) IDCNN blocks. Only available when model_type='cnn'.
                        										 Default is 4
  -crf_only             										 (OPTIONAL) Only use CRF-layers after BERT. Default is False
  -dropout_rate DROPOUT_RATE								 (OPTIONAL) Dropout rate. Default is 0.0

Some Tips

If your pretrained language model are ALBERTs(Large/Base/Tiny), remember to add parameter -albert.

If you do not want to add any downstream layers, like BiLSTM/BiGRU/IDCNN, simply add parameter -crf_only.

If you want to get the best training results, you need to assign parameters for early-stopping and reduce-learning-rate(see in Train Configs), and do not forget to add parameter -best_fit.


Examples can be seen in ./examples/train_example. Simply run bash to start training.

Here are two templates for rnn models and cnn models:


PRETRAINED_LM_DIR="/home1/liushaoweihua/pretrained_lm/albert_tiny_250k" # your pretrained language model path
DATA_DIR="../data" # your train/dev data path
OUTPUT_DIR="../models" # where to store the NER model

python \
    -train_data=${DATA_DIR}/train.txt \
    -dev_data=${DATA_DIR}/dev.txt \
    -save_path=${OUTPUT_DIR} \
    -bert_config=${PRETRAINED_LM_DIR}/albert_config_tiny.json \
    -bert_checkpoint=${PRETRAINED_LM_DIR}/albert_model.ckpt \
    -bert_vocab=${PRETRAINED_LM_DIR}/vocab.txt \
    -device_map="0" \
    -best_fit \
    -max_epochs=256 \
    -early_stop_patience=5 \
    -reduce_lr_patience=3 \
    -reduce_lr_factor=0.5 \
    -batch_size=64 \
    -max_len=512 \
    -learning_rate=5e-6 \
    -model_type="rnn" \  # rnn model
    -cell_type="bilstm" \ # rnn cell: can be "bilstm" or "bigru"
    -rnn_units=128 \
    -rnn_layers=1 \
    -dropout_rate=0.1 \
    -learning_rate=5e-5 \


PRETRAINED_LM_DIR="/home1/liushaoweihua/pretrained_lm/albert_tiny_250k" # your pretrained language model path
DATA_DIR="../data" # your train/dev data path
OUTPUT_DIR="../models" # where to store the NER model

python \
    -train_data=${DATA_DIR}/train.txt \
    -dev_data=${DATA_DIR}/dev.txt \
    -save_path=${OUTPUT_DIR} \
    -bert_config=${PRETRAINED_LM_DIR}/albert_config_tiny.json \
    -bert_checkpoint=${PRETRAINED_LM_DIR}/albert_model.ckpt \
    -bert_vocab=${PRETRAINED_LM_DIR}/vocab.txt \
    -device_map="0" \
    -best_fit \
    -max_epochs=256 \
    -early_stop_patience=5 \
    -reduce_lr_patience=3 \
    -reduce_lr_factor=0.5 \
    -batch_size=64 \
    -max_len=512 \
    -learning_rate=5e-6 \
    -model_type="cnn" \  # cnn model
    -cell_type="idcnn" \ # cnn cell: can be idcnn
    -cnn_filters=128 \
    -cnn_kernel_size=3 \
    -cnn_blocks=4 \
    -dropout_rate=0.1 \
    -learning_rate=5e-5 \

Logs in Training Phase



Both tag accuracy and sentence accuracy are printed during the training phase.


Data Format


See in ./examples/data/test.txt, data source: 互联网金融新实体发现


Simply run python keras_bert_ner/utils/ --help to see the relevant parameters. Where you will see as follows:

(nlp) liushaoweihua@ai-server-6:~/jupyterlab/Keras-Bert-Ner$ python keras_bert_ner/utils/ --help
usage: [-h] -test_data TEST_DATA [-max_len MAX_LEN] -model_path
               MODEL_PATH -model_name MODEL_NAME [-output_path OUTPUT_PATH]
               -bert_vocab BERT_VOCAB [-device_map DEVICE_MAP]

More precisely:

Data File Paths:
  Config the train/dev/test file paths
  -test_data TEST_DATA                       (REQUIRED) Test data path
  -max_len MAX_LEN                           (OPTIONAL) Max sequence length. Default is 64

Model Output Paths:
  Config the model paths
  -model_path MODEL_PATH                     (REQUIRED) Model path
  -model_name MODEL_NAME                     (REQUIRED) Model name

Output Paths:
  Config the output paths
  -output_path OUTPUT_PATH                   (OPTIONAL) Output file paths

BERT File paths:
  Config the vocab of a pretrained or fine-tuned BERT model
  -bert_vocab BERT_VOCAB                     (REQUIRED) vocab.txt

Action Configs:
  Config the actions during running
  -device_map DEVICE_MAP                     (OPTIONAL) Use CPU/GPU to train. If use CPU, then 'cpu'. 
                                             If use GPU, then assign the devices, such as '0'. Default 
                                             is 'cpu'


Examples can be seen in ./examples/test_example. Simply run bash to start testing.

Logs in Testing Phase




Examples can be seen in ./examples/deploy_example.

Simply run bash to start deploying an API.

Then run the file usage.ipynb or type your_ip:2601/?s=your_text in browser to see the result.


Max Sequence Length: 512

Memory Usage (G) 3.72 0.89
Inference Time (ms) 480 300

Logs in Deploying Phase



Some Chinese Pretrained Language Model






The architecture of this repository refers to macanv's work: BERT-BiLSTM-CRF-NER.

The most important component of keras_bert_ner refers to bojone's work: bert4keras.

The pretained Language Model ALBERT-Tiny, work of BrightMart, makes it possible for NER tasks with short inference time and relatively higher accuracy.

Thanks for all these wonderful works!


