trt batch infer results error

Question

trt batch infer results error

yuxianmin opened this issue 2 years ago · comments

Use trt backend, max_batch_size 2 for test.
When I perform inference with a single image, the return result is correct, but if two different images are input together, the second image will have wrong detections; but if the two images are same, the results are correct

What could be the reason for this? thanks

some errors for example, use scrfd_500m_bnkps_640_640_batch2.plan

when use
[
"test_images/Stallone.jpg",
"test_images/mask.jpg"
]

when use
[
"test_images/mask.jpg",
"test_images/lumia.jpg",
]

SthPhoenix · Answer 1 · Tue Jul 12 2022 18:24:40 GMT+0800 (China Standard Time)

Hi! that's interesting, could you please run the same test with yolov5s-face model to narrow down the issue?

SthPhoenix · Answer 2 · Wed Jul 13 2022 01:46:27 GMT+0800 (China Standard Time)

Should be working as expected now ) Absolutely dumb mistake with array offsets )

philoxmyu · Answer 3 · Wed Jul 13 2022 11:14:44 GMT+0800 (China Standard Time)

Thanks for replying so quickly.
In addition to this offset problem, there may be another problem that needs to be changed.
When batch input for test, I found that there is a problem with the first image. After debug, self.score_list/bbox_list/kpss_list will be overwritten by the following image, resulting in an error in the previous image.
After reset, the results is ok.

SthPhoenix · Answer 4 · Wed Jul 13 2022 11:41:39 GMT+0800 (China Standard Time)

Actually those arrays aren't reset intentionally, I found that reallocating them might noticably impact performance, so I have allocated memory during initialization and then just assigning new values.

What kind of errors you have with first image? I'll investigate it more thoroughly.

philoxmyu · Answer 5 · Wed Jul 13 2022 14:22:57 GMT+0800 (China Standard Time)

If not reset, because different images share these arrays, when the offset starts from 0 during the batch cycle, the previous detection data will be overwritten by the latter

for two images, batch=2

[
"test_images/mask.jpg",
"test_images/lumia.jpg",
]

errors in first image:

SthPhoenix · Answer 6 · Wed Jul 13 2022 19:43:48 GMT+0800 (China Standard Time)

Yes, I can reproduce this behavior too. Array slices are passed to output by reference instead of value, I'll fix it shortly.
Thanks for pointing out the issue!

philoxmyu · Answer 7 · Thu Jul 14 2022 11:42:08 GMT+0800 (China Standard Time)

By the way, ask a question. When using batch inference (eg: scrfd_500m_bnkps.onnx) under gpu with onnxruntime-gpu, the batch inference time(only the inference time, not include pre and post process) is basically not improved. Have you encountered it? The model is also running on the gpu.
thanks

SthPhoenix · Answer 8 · Thu Jul 14 2022 12:33:22 GMT+0800 (China Standard Time)

By the way, ask a question. When using batch inference (eg: scrfd_500m_bnkps.onnx) under gpu with onnxruntime-gpu, the batch inference time(only the inference time, not include pre and post process) is basically not improved. Have you encountered it? The model is also running on the gpu. thanks

I haven't added batch inference for onnxruntime, so it should be processing images one by one.

I remember there were some issues with batch inference enabled and using onnxruntime on CPU, though I can't recall if there were any issues on GPU.

philoxmyu · Answer 9 · Thu Jul 14 2022 14:24:51 GMT+0800 (China Standard Time)

okay, thanks for your reply!