can this module perform at the level of state of the art?
the f1 score is near SOTA based on Glove(100)+ELMo+CNN(char)+BiLSTM+CRF
92.65% (best), 92.45%(average, 10 runs), experiments 10, test 15
how to make it faster when it comes to using the BiLSTM?
the solution is LSTMBlockFusedCell().
3.13 times faster than LSTMCell() during training time.
1.26 times faster than LSTMCell() during inference time.
can the Transformer have competing results against the BiLSTM? and how much faster?
contextual encoding by the Transformer encoder yields competing results.
in case the sequence to sequence model like translation, the multi-head attention mechanism might be very powerful for alignments.
however, for sequence tagging, the source of power is from point-wise feed forward net with wide range of kernel size. it is not from the multi-head attention only.
if you are using kernel size 1, then the the performance will be very worse than you expect.
it seems that point-wise feed forward net collects contextual information in the layer by layer manner.
this is very similar with hierarchical convolutional neural network.
i'd like to say Attention is Not All you need
you can see the below evaluation results.
multi-layer BiLSTM using LSTMBlockFusedCell() is slightly faster than the Transformer with 4 layers on GPU.
moreover, the BiLSTM is 2 times faster on CPU environment(multi-thread) than on GPU.
LSTMBlockFusedCell() is well optimized for multi-core CPU via multi-threading.
i guess there might be an overhead when copying b/w GPU memory and main memory.
the BiLSTM is 3 ~ 4 times faster than the Transformer version on 1 CPU(single-thread)
during inference time, 1 layer BiLSTM on 1 CPU takes just 4.2 msec per sentence on average.
how to use a trained model from C++? is it much faster?
freeze model, convert to memory mapped format and load it via tensorflow C++ API.
1 layer BiLSTM on multi CPU takes 2.04 msec per sentence on average.
1 layer BiLSTM on single CPU takes 2.68 msec per sentence on average.
$ cd etagger
$ ls embeddings
embeddings/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json embeddings/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5
$ cd etagger
$ git clone https://github.com/google-research/bert.git
edit bert/modeling.py
* we do not use estimator, so modify bool to tf.bool for is_training.
is_training: bool. true for training model, false for eval model. Controls
->
is_training: tf.bool. true for training model, false for eval model. Controls
...
if not is_training:
config.hidden_dropout_prob = 0.0
config.attention_probs_dropout_prob = 0.0
->
config.hidden_dropout_prob = tf.cond(is_training, lambda: config.hidden_dropout_prob, lambda: 0.0)
config.attention_probs_dropout_prob = tf.cond(is_training, lambda: config.attention_probs_dropout_prob, lambda: 0.0)
$ python inference.py --mode line --emb_path embeddings/glove.6B.100d.txt.pkl --wrd_dim 100 --restore checkpoint/ner_model
...
Obama left office in January 2017 with a 60% approval rating and currently resides in Washington, D.C.
Obama NNP O O B-PER
left VBD O O O
office NN O O O
in IN O O O
January NNP O B-DATE O
2017 CD O I-DATE O
with IN O O O
a DT O O O
60 CD O B-PERCENT O
% NN O I-PERCENT O
approval NN O O O
rating NN O O O
and CC O O O
currently RB O O O
resides VBZ O O O
in IN O O O
Washington NNP O B-GPE B-LOC
, , O I-GPE O
D.C. NNP O B-GPE B-LOC
The Beatles were an English rock band formed in Liverpool in 1960.
The DT O O O
Beatles NNPS O B-PERSON B-MISC
were VBD O O O
an DT O O O
English JJ O B-LANGUAGE B-MISC
rock NN O O O
band NN O O O
formed VBN O O O
in IN O O O
Liverpool NNP O B-GPE B-LOC
in IN O O O
1960 CD O B-DATE O
. . O I-DATE O
inference(bucket) using frozen model, tensorRT, C++
* create virtual env `python -m venv python3.6_tfsrc` and activate it.
$ python -m venv python3.6_tfsrc
$ source /home/python3.6_tfsrc/bin/activate
* build tensorflow from source.
$ git clone https://github.com/tensorflow/tensorflow.git tensorflow-src-cpu
$ cd tensorflow-src-cpu
* you should checkout the same version of pip used for training.
$ git checkout r1.11
* modify a source file for memory mapped graph(convert_graphdef_memmapped_format)
./tensorflow/core/platform/posix/posix_file_system.cc: mmap(nullptr, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); in 'NewReadOnlyMemoryRegionFromFile'
MAP_PRIVATE -> MAP_SHARED
* configure without CUDA
$ ./configure
* build pip package with optimizations for FMA, AVX and SSE( https://medium.com/@sometimescasey/building-tensorflow-from-source-for-sse-avx-fma-instructions-worth-the-effort-fbda4e30eec3 ).
$ python -m pip install --upgrade pip
$ python -m pip install --upgrade setuptools
$ bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
* install pip package
$ python -m pip install /tmp/tensorflow_pkg/tensorflow-1.11.0-cp36-cp36m-linux_x86_64.whl
* build libraries and binaries we need.
$ bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 //tensorflow:libtensorflow.so
$ bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 //tensorflow:libtensorflow_cc.so
$ bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 //tensorflow/python/tools:optimize_for_inference
$ bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 //tensorflow/tools/quantization:quantize_graph
$ bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 //tensorflow/contrib/util:convert_graphdef_memmapped_format
$ bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 //tensorflow/tools/graph_transforms:transform_graph
* copy libraries to dist directory, export dist and includes directory.
$ export TENSORFLOW_SOURCE_DIR='/home/tensorflow-src-cpu'
$ export TENSORFLOW_BUILD_DIR='/home/tensorflow-dist-cpu'
$ cp -rf ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/*.so ${TENSORFLOW_BUILD_DIR}/
* for LSTMBlockFusedCell()
$ rnn_path=`python -c "import tensorflow; print(tensorflow.contrib.rnn.__path__[0])"`
$ rnn_ops_lib=${rnn_path}/python/ops/_lstm_ops.so
$ cp -rf ${rnn_ops_lib} ${TENSORFLOW_BUILD_DIR}
$ export LD_LIBRARY_PATH=${TENSORFLOW_BUILD_DIR}:$LD_LIBRARY_PATH
* for QRNN [optional]
$ qrnn_path=`python -c "import tensorflow as tf; print(tf.__path__[0])"`
$ qrnn_lib=${qrnn_path}/../qrnn_lib.cpython-36m-x86_64-linux-gnu.so
$ cp -rf ${qrnn_lib} ${TENSORFLOW_BUILD_DIR}
.bashrc sample
# tensorflow so, header dist
export TENSORFLOW_SOURCE_DIR='/home/tensorflow-src-cpu'
export TENSORFLOW_BUILD_DIR='/home/tensorflow-dist-cpu'
# for loading _lstm_ops.so, qrnn_lib.cpython-36m-x86_64-linux-gnu.so
export LD_LIBRARY_PATH=${TENSORFLOW_BUILD_DIR}:$LD_LIBRARY_PATH
test build sample model and inference by C++
$ cd /home/etagger
* build and save sample model
$ cd inference
$ python train_sample.py
* inference using python
$ python python/inference_sample.py
* inference using c++
* edit etagger/inference/cc/CMakeLists.txt
find_package(TensorFlow 1.11 EXACT REQUIRED)
$ cd etagger/inference/cc
$ mkdir build
$ cd build
$ cmake ..
$ make
$ cd ../..
$ ./cc/build/inference_sample
test build iris model, freezing and inference by C++
$ cd /home/etagger
* build and save iris model
$ cd inference
$ python train_iris.py
* freeze graph
$ python freeze.py --model_dir exported --output_node_names logits --frozen_model_name iris_frozen.pb
* inference using python
$ python python/inference_iris.py
* inference using C++
* edit etagger/inference/cc/CMakeLists.txt
find_package(TensorFlow 1.11 EXACT REQUIRED)
$ cd etagger/inference/cc
$ mkdir build
$ cd build
$ cmake ..
$ make
$ cd ../..
$ ./cc/build/inference_iris
export etagger model, freezing and inference by C++
$ cd inference
* let's assume that we have a saved model :
* <note> BiLSTM, LSTMBlockFusedCell()
* : if you can't find `BlockLSTM` when using import_meta_graph()
* : similar issue => https://stackoverflow.com/questions/50298058/restore-trained-tensorflow-model-keyerror-blocklstm
: how to fix? => https://github.com/tensorflow/tensorflow/issues/23369
: what about C++? => https://stackoverflow.com/questions/50475320/executing-frozen-tensorflow-graph-that-uses-tensorflow-contrib-resampler-using-c
we can load '_lstm_ops.so' for LSTMBlockFusedCell().
* restore the model to check list of operations, placeholders and tensors for mapping. and export it another place.
$ python export.py --restore ../checkpoint/ner_model --export exported/ner_model --export-pb exported
* freeze graph
$ python freeze.py --model_dir exported --output_node_names logits_indices,sentence_lengths --frozen_model_name ner_frozen.pb
$ ln -s ../embeddings embeddings
* inference using python
$ python python/inference.py --emb_path embeddings/glove.6B.100d.txt.pkl --wrd_dim 100 --frozen_path exported/ner_frozen.pb < ../data/test.txt > pred.txt
$ python python/inference.py --emb_path embeddings/glove.6B.300d.txt.pkl --wrd_dim 300 --frozen_path exported/ner_frozen.pb < ../data/test.txt > pred.txt
$ python python/inference.py --emb_path embeddings/glove.840B.300d.txt.pkl --wrd_dim 300 --frozen_path exported/ner_frozen.pb < ../data/test.txt > pred.txt
* you may need to modify build_input_feed_dict() in 'python/inference.py' for emb_class='bert'.
* since some of input tensor might not exist in the frozen graph. ex) 'input_data_chk_ids'
* inference using python with optimized graph_def via tensorRT (only for GPU)
$ python python/inference_trt.py --emb_path embeddings/glove.6B.100d.txt.pkl --wrd_dim 100 --frozen_path exported/ner_frozen.pb < ../data/test.txt > pred.txt
$ python python/inference_trt.py --emb_path embeddings/glove.6B.300d.txt.pkl --wrd_dim 300 --frozen_path exported/ner_frozen.pb < ../data/test.txt > pred.txt
$ python python/inference_trt.py --emb_path embeddings/glove.840B.300d.txt.pkl --wrd_dim 300 --frozen_path exported/ner_frozen.pb < ../data/test.txt > pred.txt
* inspect `pred.txt` whether the predictions are same.
$ python ../token_eval.py < pred.txt
* for inference by C++, i implemented emb_class='glove' only.
* inference using C++
$ ./cc/build/inference exported/ner_frozen.pb embeddings/vocab.txt < ../data/test.txt > pred.txt
* inspect `pred.txt` whether the predictions are same.
$ python ../token_eval.py < pred.txt
optimizing graph for inference, convert it to memory mapped format and inference by C++
$ cd inference
* optimize graph for inference
# not working properly
$ ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/python/tools/optimize_for_inference --input=exported/ner_frozen.pb --output=exported/ner_frozen.pb.optimized --input_names=is_train,sentence_length,input_data_pos_ids,input_data_chk_ids,input_data_word_ids,input_data_wordchr_ids --output_names=logits_indices,sentence_lengths
* quantize graph
# not working properly
$ ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/tools/quantization/quantize_graph --input=exported/ner_frozen.pb --output=exported/ner_frozen.pb.rounded --output_node_names=logits_indices,sentence_lengths --mode=weights_rounded
* transform graph
$ ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=exported/ner_frozen.pb --out_graph=exported/ner_frozen.pb.transformed --inputs=is_train,sentence_length,input_data_pos_ids,input_data_chk_ids,input_data_word_ids,input_data_wordchr_ids --outputs=logits_indices,sentence_lengths --transforms='strip_unused_nodes merge_duplicate_nodes round_weights(num_steps=256) sort_by_execution_order'
* convert to memory mapped format
$ ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/contrib/util/convert_graphdef_memmapped_format --in_graph=exported/ner_frozen.pb --out_graph=exported/ner_frozen.pb.memmapped
or
$ ${TENSORFLOW_SOURCE_DIR}/bazel-bin/tensorflow/contrib/util/convert_graphdef_memmapped_format --in_graph=exported/ner_frozen.pb.transformed --out_graph=exported/ner_frozen.pb.memmapped
* inference using C++
$ ./cc/build/inference exported/ner_frozen.pb.memmapped embeddings/vocab.txt 1 < ../data/test.txt > pred.txt
* inspect `pred.txt` whether the predictions are same.
$ python ../token_eval.py < pred.txt
* inspect the memory mapped graph is opened with MAP_SHARED
$ cat /proc/pid/maps
7fae40522000-7fae4a000000 r--s 00000000 08:11 749936602 /root/etagger/inference/exported/ner_frozen.pb.memmapped
...