lyuwenyu / RT-DETR

[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tensorrt推理耗时测试问题

leoxxxxxD opened this issue · comments

commented

模型用的是rtdetr_r18vd_6x_coco,我看您测的fps是217,不知道您看到的是哪个指标计算得到的,我测试的T4上推理的结果如下:
[02/23/2024-03:07:42] [I] === Performance summary ===
[02/23/2024-03:07:42] [I] Throughput: 164.577 qps
[02/23/2024-03:07:42] [I] Latency: min = 5.90283 ms, max = 7.3186 ms, mean = 6.05075 ms, median = 6.0332 ms, percentile(90%) = 6.14221 ms, percentile(95%) = 6.26196 ms, percentile(99%) = 6.35991 ms
[02/23/2024-03:07:42] [I] Enqueue Time: min = 5.88025 ms, max = 7.29272 ms, mean = 6.0273 ms, median = 6.00977 ms, percentile(90%) = 6.11768 ms, percentile(95%) = 6.23471 ms, percentile(99%) = 6.33209 ms
[02/23/2024-03:07:42] [I] H2D Latency: min = 0.800537 ms, max = 0.845398 ms, mean = 0.818772 ms, median = 0.818604 ms, percentile(90%) = 0.821167 ms, percentile(95%) = 0.823975 ms, percentile(99%) = 0.827393 ms
[02/23/2024-03:07:42] [I] GPU Compute Time: min = 5.07678 ms, max = 6.48608 ms, mean = 5.22236 ms, median = 5.20569 ms, percentile(90%) = 5.31421 ms, percentile(95%) = 5.43716 ms, percentile(99%) = 5.52832 ms
[02/23/2024-03:07:42] [I] D2H Latency: min = 0.00805664 ms, max = 0.0310059 ms, mean = 0.00961827 ms, median = 0.00927734 ms, percentile(90%) = 0.0107727 ms, percentile(95%) = 0.0113831 ms, percentile(99%) = 0.0145264 ms
[02/23/2024-03:07:42] [I] Total Host Walltime: 3.01379 s
[02/23/2024-03:07:42] [I] Total GPU Compute Time: 2.59029 s

commented

补充一下,paddle=2.4.2,tensorrt=8.5.2

使用trtexec测的话可以根据 GPU Compute Time 估算的

(复现的话可以使用我们这套测速的代码 https://github.com/lyuwenyu/RT-DETR/tree/main/benchmark

commented

@lyuwenyu trtinfer.py中是不是缺了pycuda?还有就是怎么调用呢

你好,请问你测试了yolov8s的推理耗时么,我在T4上怎么测都是v8s要快一些...不知道哪出了问题

commented

@Sssssd 论文中rtdetr的速度暂时还没复现出来,yolo没测,不知道会不会是服务器的原因

@Sssssd 论文中rtdetr的速度暂时还没复现出来,yolo没测,不知道会不会是服务器的原因

我试了下在COCO数据集上得到的模型的测速,在trtexec上测的GPU Compute Time比较符合论文结果,但用trtinfer测的时候还是没拉开差距,v8s比较符合论文的结果,但RTDETR跟v8s还是很接近,没有论文里那么大的优势。但在我自己10个类别的数据集上,RTDETR在trtexec上的测速也基本没有优势,和v8s差不多,trtinfer倒是没什么变化,不知道是什么问题,脑壳疼。

commented

@Sssssd 意思是,你用他们开源的coco权重,复现了论文的fps是吗,我自己复现r18的倒是差了挺多

@Sssssd 意思是,你用他们开源的coco权重,复现了论文的fps是吗,我自己复现r18的倒是差了挺多

只能在trtexec上复现论文fps,trtinfer暂时没复现出来,不知道是不是我用的姿势有问题