NVIDIA TensorRT implementation for facenet with pre-train SavedModel. facenet is a project from https://github.com/davidsandberg/facenet to do face recognition with tensorflow.
- facenet.py: Enable facenet pre-train SavedModel with TRT
- face.py: Add threshold of probobility for return, change minimum size of face to 50px, change gpu_memory_fraction to 0.3
- /align/detect_face.py: Enable TensorRT for PNET, RNET and ONET graph
- face.py and facenet.py: Minor change to support multi-thread
- face.py: Change input:0 to batch_join:0 to support both TensorRT4 and TensorRT5
"NVIDIA announced the integration of our TensorRT inference optimization tool with TensorFlow. TensorRT integration will be available for use in the TensorFlow 1.7 branch. TensorFlow remains the most popular deep learning framework today while NVIDIA TensorRT speeds up deep learning inference through optimizations and high-performance runtimes for GPU-based platforms. We wish to give TensorFlow users the highest inference performance possible along with a near transparent workflow using TensorRT. The new integration provides a simple API which applies powerful FP16 and INT8 optimizations using TensorRT from within TensorFlow. TensorRT sped up TensorFlow inference by 8x for low latency runs of the ResNet-50 benchmark." - from NVIDIA website.
Latest TensorRT version is 5.0.4.
See details from below links:
https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/
https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html
See documents for support matrix: https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/index.html
TRT Installation document: https://developer.download.nvidia.com/compute/machine-learning/tensorrt/docs/5.0/GA_5.0.2.6/TensorRT-Installation-Guide.pdf
Download facenet.py, face.py (optional), /align/detect_face.py and replace original files.
HW | Ubuntu | Driver | CUDA | cuDNN | TensorRT | TensorFlow |
---|---|---|---|---|---|---|
Tesla V100 graphic and intel x86_64 | 16.04 | 384.111 | 9.0.179 | 7.3.1 | 4.0.1.6 | 1.12 gpu |
Quadro V100 graphic and intel x86_64 | 18.04 | 410.93 | 10.0.117 | 7.3.1 | 5.0.3 | 1.12 gpu |
Jetson Xavier with internal GV10B GPU | 18.04 | L4T 4.1.1 | 10.0.117 | 7.3.1 | 5.0.3 | 1.12 gpu |
*Note: this table is only for face identify inception-resnet v1 network savedmodel runtime improvement compare. Xavier is Jetson Xavier with L4T 4.1.1.
TensorRT 4 result
Face detection with MTCNN: test 30 times with different image at different resolution
Detect Network | Avg Time |
---|---|
original network ckpt | 41.948318 ms |
tensorrt network FP32 | 41.948318 ms |
tensorrt network FP16 | 42.028268 ms |
*Note: suspect MTCNN network is not converted to TensorRT network automatically, will investage more and try plugin later. And also, I found there is no improvement with checkpoints file, so that means we may not get imporvement with similar method for MTCNN graph convert. Suspected this is some bug in TRT, still working on it.
Face identify with Inception-ResNet-v1 : test 27 times with different image (crop and alignment 160x160)
Identify Network | Avg Time |
---|---|
original network ckpt | 13.713258 ms |
tensorrt network FP32 | 11.296281 ms |
tensorrt network FP16 | 10.54711 ms |
*Note: INT8 not implemented due to some issues which may same as tensorflow/tensorflow#22854 *Note: The result is based on savedmodel file, for checkpoints frozen graph, has no runtime improvement, that may be a bug, still working on it.
TensorRT 5 result Similar to TRT4 but the runtime improvement with savedmodel is about 11.89% on GV100.
TensorRT 5 on Xavier result Similar to TRT4 but the runtime improvement with savedmodel is about 23.15% on Xavier: test 20 times with same image (crop and alignment 160x160, except of first long init one)
Identify Network | Avg Time |
---|---|
original network ckpt | 45.034961 ms |
tensorrt network savedmodel FP16 | 37.567716 ms |