Inference time seems not correct since the cuda is asynchronized?

Question

Inference time seems not correct since the cuda is asynchronized?

PonyPC opened this issue 4 years ago · comments

In 'python/lib/Processor.py'

        start = time.time()
        self.context.execute_async_v2(
                bindings=self.bindings,
                stream_handle=self.stream.handle)
        end = time.time()
        print('execution time:', end-start)

Sean Pollock · Answer 1 · Sun Oct 18 2020 10:51:13 GMT+0800 (China Standard Time)

Yes @PonyPC good call! The time bookend should take place after allocating memory from the gpu back to the device, and synchronization has taken place. Will update README so the inference times listed are not misleading.

https://github.com/SeanAvery/yolov5-tensorrt/blob/master/python/lib/Processor.py#L97

I honestly have not even taken a serious look at speed.

The first thing I want to do is optimize post-processing and NMS -- moving all numpy ops to pycuda.

PonyPC · Answer 2 · Sun Oct 18 2020 12:43:14 GMT+0800 (China Standard Time)

I have test this about 6 fps on jetson nano. Is it a normal speed?
~~Also how to implement batch-size more than 1 so that can boost?~~Finished

Thanks for your great job.