[Bug] DINO 模型导出onnx 文件后，推理速度变对比于mmdetection pytorch 版本慢很多，是否是mmdeploy_onnxruntime_ops.dll 有问题？

Question

[Bug] DINO 模型导出onnx 文件后，推理速度变对比于mmdetection pytorch 版本慢很多，是否是mmdeploy_onnxruntime_ops.dll 有问题？

wuzujiong opened this issue 5 months ago · comments

Checklist

I have searched related issues but cannot get the expected help.
2. I have read the FAQ documentation but cannot get the expected help.
3. The bug has not been fixed in the latest version.

Describe the bug

测试代码：
mmdetection:
from mmdet.apis import init_detector, inference_detector
import time
config_file = 'work_dirs/custom_dino-4scale_r50/custom_dino-4scale_r50.py'
checkpoint_file = 'work_dirs/custom_dino-4scale_r50/epoch_16.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0') # or device='cuda:0'

for i in range(100):
t = time.time()
result = inference_detector(model, 'demo/demo.jpg')
print(time.time() - t)
输出：平均速度为70ms

onnxruntime:
import onnxruntime as ort
import torchvision.transforms as T
from PIL import Image
import torch
import time

image_transforms = T.Compose([
T.Resize((800, 1067), interpolation=T.InterpolationMode.BICUBIC),
# T.CenterCrop(504),
T.ToTensor(),
T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
opt = ort.SessionOptions()
opt.register_custom_ops_library(r"mmdeploy/lib/mmdeploy_onnxruntime_ops.dll")
opt.enable_profiling = True

ort_sess = ort.InferenceSession('D:/DL/mmdeploy-main/dino_detections/end2end.onnx', sess_options=opt, providers=['CUDAExecutionProvider'])

input_dims = ort_sess.get_inputs()

test_img = Image.open('test.bmp').convert('RGB')
t_img = image_transforms(test_img)
t_img = torch.unsqueeze(t_img, dim=0)

for i in range(100):
t1 = time.time()
outputs = ort_sess.run(None, {'input': t_img.numpy()})
print(time.time() - t1)

平均速度为1.2s

Reproduction

none

Environment

运行环境: windows 10, RTX-3080 10G, cuda 11.3

Error traceback

No response