Batch inference detection model

Question

Batch inference detection model

jhanvi22 opened this issue 3 years ago · comments

I’m trying to convert retinaface_mnet025_v1 onnx model to tensorrt with batchsize 64.
During inference,it takes 390ms for batch inference (batch_size = 64)
and 7 ms for batch size = 1.

Please let me know if I’m missing out on something.
Attaching google drive link for tensorrt model generation.
here

SthPhoenix · Answer 1 · Fri Feb 12 2021 22:30:03 GMT+0800 (China Standard Time)

Hi!
390/64 = 6,09375
Faster than for single image, but it's indeed not that large difference. Try checking different batch sizes, 64 might be too high for your GPU.

jhanvi22 · Answer 2 · Mon Feb 15 2021 13:39:01 GMT+0800 (China Standard Time)

Got it. I'll try with batch size < 64.

SthPhoenix · Answer 3 · Mon Mar 08 2021 03:38:55 GMT+0800 (China Standard Time)

Hi @jhanvi22! Have you managed to improve batch Inference speed?
I have just noticed possible bug in TensorRT python API: when building engine with explicit batch and fp16 support, fp16 flag is ignored and inference is executed with fp32 precision, which almost neglects benefits of batch processing.

jhanvi22 · Answer 4 · Tue Mar 09 2021 17:34:39 GMT+0800 (China Standard Time)

Hi. I was not able to move forward with batch inference. I mean yes, time taken to infer 64 images with batch size 64 << time taken to infer 64 images with batch size 1. I didnt get time to test it with smaller batch sizes will update on it soon.

SthPhoenix · Answer 5 · Thu Mar 11 2021 04:59:01 GMT+0800 (China Standard Time)

I have added fix for batch inference with FP16 support. Please check if this helps in your case. Ensure you have set force_fp16=True if your GPU supports it.

SthPhoenix · Answer 6 · Fri Apr 30 2021 03:46:14 GMT+0800 (China Standard Time)

Closing for inactivity, feel free to reopen if you have any updates