clancylian / retinaface

Reimplement RetinaFace use C++ and TensorRT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RetinaFace C++ Reimplement

source

Reference resources RetinaFace in insightface with python code.

model transformation tool

MXNet2Caffe

you need to add some layers yourself, and in caffe there is not upsample,you can replace with deconvolution,and maybe slight accuracy loss.

the origin model reference from mobilenet25,and I have retrain it.

Demo

$ mkdir build
$ cd build/
$ cmake ../
$ make

you need to modify dependency path in CmakeList file.

Speed

test hardware:1080Ti

test1:

model speed input size preprocess time inference postprocess time
mxnet 44.8ms 1280x896 19.0ms 8.0ms 16.0ms
caffe 46.9ms 1280x896 5.8ms 24.1ms 16.0ms
tensorrt 29.3ms 1280x896 6.9ms 5.4ms 15.0ms

test2:

model speed inputsize preprocess time inference postprocess time
mxnet 6.4ms 320x416 1.3ms 0.1ms 4.2ms
caffe 30.8ms 320x416 1.2ms 27ms 2.3ms
tensorrt 4.7ms 320x416 0.7ms 1.9ms 1.8ms

tensorrt batch test:

batchsize inputsize maxbatchsize preprocess time inference postprocess time all GPU
1 448x448 8 1.0ms 2.3ms 2.6ms 6.7ms 35%
2 448x448 8 2.5ms 3.3ms 5.2ms 11.8ms 33%
4 448x448 8 4.1ms 4.6ms 10.0ms 21.8ms 28%
8 448x448 8 8.7ms 7.0ms 20.3ms 40.7ms 23%
16 448x448 32 28.1 14.7 38.7ms 92.0ms -
32 448x448 32 36.2ms 26.3 75.7ms 163.5ms -

note: batch size have some advantage in inference but can't speed up preprocess and postprocess.

optimize post process:

batchsize inputsize maxbatchsize preprocess time inference postprocess time all GPU
1 448x448 8 1.0ms 2.3ms 0.09ms 3.5ms 70%
2 448x448 8 2.2ms 2.8ms 0.2ms 5.3ms 60%
4 448x448 8 3.7ms 5.0ms 0.3ms 8.4ms 55%
8 448x448 8 7.5ms 6.5ms 0.67ms 14.9ms 50%
16 448x448 32 26ms 13ms 1.3ms 41ms 40%
32 448x448 32 32ms 22ms 2.7ms 56.6ms 50%

use nvidia npp library to speed up preprocess:

batchsize inputsize maxbatchsize preprocess time inference postprocess time all GPU
1 448x448 8 0.2ms 2.3ms 0.1ms 2.6ms 91%
2 448x448 8 0.3ms 3.0ms 0.2ms 3.5ms 85%
4 448x448 8 0.5ms 4.1ms 0.32ms 5.0ms 82%
8 448x448 8 1.2ms 6.3ms 0.77ms 8.3ms 79%
16 448x448 32 2.2ms 14ms 1.3ms 16.7ms 80%
32 448x448 32 5.0ms 22ms 2.8ms 29.3ms 77%

INT8 inference

INT8 calibration table can generate by INT8-Calibration-Tool.

Accuracy

https://raw.githubusercontent.com/clancylian/retinaface/master/data/retinaface-widerface%E6%B5%8B%E8%AF%95.png

About

Reimplement RetinaFace use C++ and TensorRT

License:MIT License


Languages

Language:C++ 72.5%Language:Python 12.6%Language:Cuda 8.9%Language:CMake 3.2%Language:QMake 2.6%Language:C 0.2%