Inference time seems not correct since the cuda is asynchronized?
PonyPC opened this issue · comments
PonyPC commented
In 'python/lib/Processor.py'
start = time.time()
self.context.execute_async_v2(
bindings=self.bindings,
stream_handle=self.stream.handle)
end = time.time()
print('execution time:', end-start)
Sean Pollock commented
Yes @PonyPC good call! The time bookend should take place after allocating memory from the gpu back to the device, and synchronization has taken place. Will update README so the inference times listed are not misleading.
https://github.com/SeanAvery/yolov5-tensorrt/blob/master/python/lib/Processor.py#L97
I honestly have not even taken a serious look at speed.
The first thing I want to do is optimize post-processing and NMS -- moving all numpy ops to pycuda.