seanavery / yolov5-tensorrt

YOLOv5 in TensorRT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inference time seems not correct since the cuda is asynchronized?

PonyPC opened this issue · comments

In 'python/lib/Processor.py'

        start = time.time()
        self.context.execute_async_v2(
                bindings=self.bindings,
                stream_handle=self.stream.handle)
        end = time.time()
        print('execution time:', end-start)

Yes @PonyPC good call! The time bookend should take place after allocating memory from the gpu back to the device, and synchronization has taken place. Will update README so the inference times listed are not misleading.

https://github.com/SeanAvery/yolov5-tensorrt/blob/master/python/lib/Processor.py#L97

I honestly have not even taken a serious look at speed.

The first thing I want to do is optimize post-processing and NMS -- moving all numpy ops to pycuda.

I have test this about 6 fps on jetson nano. Is it a normal speed?
Also how to implement batch-size more than 1 so that can boost?Finished

Thanks for your great job.