SthPhoenix / InsightFace-REST

InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Batch inference detection model

jhanvi22 opened this issue · comments

I’m trying to convert retinaface_mnet025_v1 onnx model to tensorrt with batchsize 64.
During inference,it takes 390ms for batch inference (batch_size = 64)
and 7 ms for batch size = 1.

Please let me know if I’m missing out on something.
Attaching google drive link for tensorrt model generation.
here

Hi!
390/64 = 6,09375
Faster than for single image, but it's indeed not that large difference. Try checking different batch sizes, 64 might be too high for your GPU.

Got it. I'll try with batch size < 64.

Hi @jhanvi22! Have you managed to improve batch Inference speed?
I have just noticed possible bug in TensorRT python API: when building engine with explicit batch and fp16 support, fp16 flag is ignored and inference is executed with fp32 precision, which almost neglects benefits of batch processing.

Hi. I was not able to move forward with batch inference. I mean yes, time taken to infer 64 images with batch size 64 << time taken to infer 64 images with batch size 1. I didnt get time to test it with smaller batch sizes will update on it soon.

I have added fix for batch inference with FP16 support. Please check if this helps in your case. Ensure you have set force_fp16=True if your GPU supports it.

Closing for inactivity, feel free to reopen if you have any updates