Inference error

Question

Inference error

wulouzhu opened this issue 2 years ago · comments

Hi:
I have converted faster-rcnn model downloaded from mmdetection zoo to trt engine sucessfully, but when I run inference_model the error happened:
[2022-04-22 07:27:52.715] [mmdeploy] [info] [model.cpp:95] Register 'DirectoryModel'
2022-04-22 07:27:57,889 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /root/workspace/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so
2022-04-22 07:27:57,889 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /root/workspace/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so
/opt/conda/lib/python3.8/site-packages/mmdet-2.22.0-py3.8.egg/mmdet/datasets/utils.py:66: UserWarning: "ImageToTensor" pipeline is replaced by "DefaultFormatBundle" for batch inference. It is recomm ended to manually replace it in the test data pipeline in your config file.
warnings.warn(
#assertion/root/workspace/mmdeploy/csrc/backend_ops/tensorrt/batched_nms/trt_batched_nms.cpp,98
Aborted (core dumped)

Could you please tell me why it happend and how to deal with it? Thank you.

q.yao · Answer 1 · Sun Apr 24 2022 12:50:11 GMT+0800 (China Standard Time)

Hi, sorry for the late reply. Could you provide more detail about your environment? You can use https://github.com/open-mmlab/mmdeploy/blob/master/tools/check_env.py to check the environment.

wulouzhu · Answer 2 · Mon Apr 25 2022 15:39:53 GMT+0800 (China Standard Time)

@grimoire
The problem above happened because the mmdetection config was wrong. Now I have solved it. But when I turned to inference mask-rcnn with python, I got the error:
[2022-04-25 07:27:01.643] [mmdeploy] [info] [model.cpp:95] Register 'DirectoryModel'
2022-04-25 07:27:06,617 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /root/workspace/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so
2022-04-25 07:27:06,617 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /root/workspace/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so
/opt/conda/lib/python3.8/site-packages/mmdet-2.23.0-py3.8.egg/mmdet/datasets/utils.py:66: UserWarning: "ImageToTensor" pipeline is replaced by "DefaultFormatBundle" for batch inference. It is recommended to manually replace it in the test data pipeline in your config file.
warnings.warn(
Traceback (most recent call last):
File "mmdetection/demo/deploy.py", line 16, in
result = inference_model(model_cfg, deploy_cfg, backend_files, img=img, device=device)
File "/root/workspace/mmdeploy/mmdeploy/apis/inference.py", line 51, in inference_model
result = task_processor.run_inference(model, model_inputs)
File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/deploy/object_detection.py", line 199, in run_inference
return model(**model_inputs, return_loss=False, rescale=True)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, kwargs)
File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 202, in forward
outputs = End2EndModel.__clear_outputs(outputs)
File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 110, in __clear_outputs
outputs[output_id][i] = test_outputs[output_id][i, inds, ...]
RuntimeError: CUDA error: misaligned address
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: misaligned address
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1614378062065/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f4167cbb2f2 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::string const&) + 0x5b (0x7f4167cb867b in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0x809 (0x7f4167f14219 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f4167ca33a4 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: + 0x6e0d9a (0x7f41b4ae8d9a in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x6e0e31 (0x7f41b4ae8e31 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)

frame #17: __libc_start_main + 0xf3 (0x7f41daa0f0b3 in /usr/lib/x86_64-linux-gnu/libc.so.6)

And my environment which is built from DOCKERFILE is(by using tools/check_env.py) :
2022-04-25 07:37:39,870 - mmdeploy - INFO -

2022-04-25 07:37:39,870 - mmdeploy - INFO - Environmental information
fatal: not a git repository (or any of the parent directories): .git
2022-04-25 07:37:41,283 - mmdeploy - INFO - sys.platform: linux
2022-04-25 07:37:41,283 - mmdeploy - INFO - Python: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
2022-04-25 07:37:41,283 - mmdeploy - INFO - CUDA available: True
2022-04-25 07:37:41,283 - mmdeploy - INFO - GPU 0: Quadro RTX 6000
2022-04-25 07:37:41,283 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2022-04-25 07:37:41,283 - mmdeploy - INFO - NVCC: Build cuda_11.3.r11.3/compiler.29745058_0
2022-04-25 07:37:41,283 - mmdeploy - INFO - GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
2022-04-25 07:37:41,283 - mmdeploy - INFO - PyTorch: 1.8.0
2022-04-25 07:37:41,284 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.2
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

2022-04-25 07:37:41,284 - mmdeploy - INFO - TorchVision: 0.9.0
2022-04-25 07:37:41,284 - mmdeploy - INFO - OpenCV: 4.5.5
2022-04-25 07:37:41,284 - mmdeploy - INFO - MMCV: 1.4.0
2022-04-25 07:37:41,284 - mmdeploy - INFO - MMCV Compiler: GCC 7.3
2022-04-25 07:37:41,284 - mmdeploy - INFO - MMCV CUDA Compiler: 10.2
2022-04-25 07:37:41,284 - mmdeploy - INFO - MMDeploy: 0.4.0+
2022-04-25 07:37:41,284 - mmdeploy - INFO -

2022-04-25 07:37:41,284 - mmdeploy - INFO - Backend information
[2022-04-25 07:37:41.475] [mmdeploy] [info] [model.cpp:95] Register 'DirectoryModel'
2022-04-25 07:37:41,542 - mmdeploy - INFO - onnxruntime: 1.8.1 ops_is_avaliable : True
2022-04-25 07:37:41,543 - mmdeploy - INFO - tensorrt: 7.2.3.4 ops_is_avaliable : True
2022-04-25 07:37:41,543 - mmdeploy - INFO - ncnn: None ops_is_avaliable : False
2022-04-25 07:37:41,544 - mmdeploy - INFO - pplnn_is_avaliable: False
2022-04-25 07:37:41,544 - mmdeploy - INFO - openvino_is_avaliable: False
2022-04-25 07:37:41,544 - mmdeploy - INFO -

2022-04-25 07:37:41,544 - mmdeploy - INFO - Codebase information
2022-04-25 07:37:41,545 - mmdeploy - INFO - mmdet: 2.23.0
2022-04-25 07:37:41,545 - mmdeploy - INFO - mmseg: None
2022-04-25 07:37:41,546 - mmdeploy - INFO - mmcls: None
2022-04-25 07:37:41,546 - mmdeploy - INFO - mmocr: None
2022-04-25 07:37:41,546 - mmdeploy - INFO - mmedit: None
2022-04-25 07:37:41,546 - mmdeploy - INFO - mmdet3d: None
2022-04-25 07:37:41,546 - mmdeploy - INFO - mmpose: None

Looking forward to your reply!

q.yao · Answer 3 · Mon Apr 25 2022 16:38:14 GMT+0800 (China Standard Time)

What is your host cuda driver? The MMDeploy in docker is built with nvcc==11.3 but your pytorch and mmcv are build with cuda10.2.

wulouzhu · Answer 4 · Mon Apr 25 2022 17:00:28 GMT+0800 (China Standard Time)

@grimoire
My host cuda driver is 510.54. My docker is built from https://github.com/open-mmlab/mmdeploy/blob/master/docker/GPU/Dockerfile.
In the dockerfile, pytorch and mmcv are build with cuda10.2
FROM nvcr.io/nvidia/tensorrt:21.04-py3

ARG CUDA=10.2
ARG PYTHON_VERSION=3.8
ARG TORCH_VERSION=1.8.0
ARG TORCHVISION_VERSION=0.9.0
ARG ONNXRUNTIME_VERSION=1.8.1
ARG MMCV_VERSION=1.4.0
ARG PPLCV_VERSION=0.6.2
ENV FORCE_CUDA="1"

Is the Dockerfile wrong?

wulouzhu · Answer 5 · Mon Apr 25 2022 17:18:56 GMT+0800 (China Standard Time)

At the beginning ,I built a docker with pytorch and mmcv build with cuda11.3.
root@07231d93287f:~/workspace# python mmdeploy/tools/check_env.py
2022-04-25 09:11:44,986 - mmdeploy - INFO -

2022-04-25 09:11:44,986 - mmdeploy - INFO - Environmental information
fatal: not a git repository (or any of the parent directories): .git
2022-04-25 09:11:46,632 - mmdeploy - INFO - sys.platform: linux
2022-04-25 09:11:46,633 - mmdeploy - INFO - Python: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
2022-04-25 09:11:46,633 - mmdeploy - INFO - CUDA available: True
2022-04-25 09:11:46,633 - mmdeploy - INFO - GPU 0: Quadro RTX 6000
2022-04-25 09:11:46,633 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2022-04-25 09:11:46,633 - mmdeploy - INFO - NVCC: Build cuda_11.3.r11.3/compiler.29745058_0
2022-04-25 09:11:46,633 - mmdeploy - INFO - GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
2022-04-25 09:11:46,633 - mmdeploy - INFO - PyTorch: 1.10.0
2022-04-25 09:11:46,633 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX512
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

2022-04-25 09:11:46,634 - mmdeploy - INFO - TorchVision: 0.11.0
2022-04-25 09:11:46,634 - mmdeploy - INFO - OpenCV: 4.5.5
2022-04-25 09:11:46,634 - mmdeploy - INFO - MMCV: 1.4.0
2022-04-25 09:11:46,634 - mmdeploy - INFO - MMCV Compiler: GCC 7.3
2022-04-25 09:11:46,634 - mmdeploy - INFO - MMCV CUDA Compiler: 11.3
2022-04-25 09:11:46,634 - mmdeploy - INFO - MMDeploy: 0.4.0+
2022-04-25 09:11:46,634 - mmdeploy - INFO -

2022-04-25 09:11:46,635 - mmdeploy - INFO - Backend information
[2022-04-25 09:11:46.841] [mmdeploy] [info] [model.cpp:95] Register 'DirectoryModel'
2022-04-25 09:11:46,915 - mmdeploy - INFO - onnxruntime: 1.8.1 ops_is_avaliable : True
2022-04-25 09:11:46,916 - mmdeploy - INFO - tensorrt: 7.2.3.4 ops_is_avaliable : True
2022-04-25 09:11:46,916 - mmdeploy - INFO - ncnn: None ops_is_avaliable : False
2022-04-25 09:11:46,916 - mmdeploy - INFO - pplnn_is_avaliable: False
2022-04-25 09:11:46,917 - mmdeploy - INFO - openvino_is_avaliable: False
2022-04-25 09:11:46,917 - mmdeploy - INFO -

2022-04-25 09:11:46,917 - mmdeploy - INFO - Codebase information
2022-04-25 09:11:46,918 - mmdeploy - INFO - mmdet: 2.23.0
2022-04-25 09:11:46,918 - mmdeploy - INFO - mmseg: None
2022-04-25 09:11:46,918 - mmdeploy - INFO - mmcls: None
2022-04-25 09:11:46,918 - mmdeploy - INFO - mmocr: None
2022-04-25 09:11:46,918 - mmdeploy - INFO - mmedit: None
2022-04-25 09:11:46,918 - mmdeploy - INFO - mmdet3d: None
2022-04-25 09:11:46,918 - mmdeploy - INFO - mmpose: None

But when I converted mask-rcnn model to trt engine, I got the error：
[TensorRT] WARNING: Output type must be INT32 for shape outputs
[TensorRT] WARNING: Output type must be INT32 for shape outputs
[TensorRT] WARNING: Output type must be INT32 for shape outputs
[TensorRT] WARNING: Output type must be INT32 for shape outputs
[TensorRT] INFO: Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[TensorRT] ERROR: ../builder/cudnnBuilderUtils.cpp (408) - Cuda Error in findFastestTactic: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (32) - Cuda Error in free: 700 (an illegal memory access was encountered)
terminate called after throwing an instance of 'nvinfer1::CudaError'
what(): std::exception
2022-04-25 09:10:45,266 - mmdeploy - ERROR - onnx2tensorrt of mmdetection/checkpoints/mask_rcnn/end2end.onnx failed.

So I turned to cuda10.2 which provided by the project without any change

q.yao · Answer 6 · Mon Apr 25 2022 17:25:04 GMT+0800 (China Standard Time)

Errr, I want to know the cuda driver of your host (outside the docker). If the cuda version in docker is higher than the which your host driver supported, you might get unexpected result.

wulouzhu · Answer 7 · Mon Apr 25 2022 17:29:35 GMT+0800 (China Standard Time)

I have answered the question above

@grimoire My host cuda driver is 510.54. My docker is built from https://github.com/open-mmlab/mmdeploy/blob/master/docker/GPU/Dockerfile. In the dockerfile, pytorch and mmcv are build with cuda10.2 FROM nvcr.io/nvidia/tensorrt:21.04-py3

ARG CUDA=10.2 ARG PYTHON_VERSION=3.8 ARG TORCH_VERSION=1.8.0 ARG TORCHVISION_VERSION=0.9.0 ARG ONNXRUNTIME_VERSION=1.8.1 ARG MMCV_VERSION=1.4.0 ARG PPLCV_VERSION=0.6.2 ENV FORCE_CUDA="1"

Is the Dockerfile wrong?

q.yao · Answer 8 · Mon Apr 25 2022 18:38:43 GMT+0800 (China Standard Time)

@AllentDan

AllentDan · Answer 9 · Tue Apr 26 2022 09:20:31 GMT+0800 (China Standard Time)

Hi, @wulouzhu. Could you please provide the conversion scripts?

wulouzhu · Answer 10 · Tue Apr 26 2022 09:36:48 GMT+0800 (China Standard Time)

@AllentDan
python ${MMDEPLOY_DIR}/tools/deploy.py ${MMDEPLOY_DIR}/configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py ${MMDET_DIR}/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py ${CHECKPOINT_DIR}/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth ${INPUT_IMG} --work-dir ${WORK_DIR} --device cuda:0 --dump-info

When I use cuda10.2 for pytorch and mmcv, it can convert sucessfully but fail to infer. When I use cuda11.3 for pytorch and mmcv, it failed to convert. The error detail you can find above

AllentDan · Answer 11 · Tue Apr 26 2022 12:14:26 GMT+0800 (China Standard Time)

That's weird. I tested it successfully a minute ago with mmdeploy Dockerfile with the following commands:

docker run --gpus all -it -p 8081:8082 gpu_test
pip install mmdet
cd ~/workspace && git clone https://github.com/open-mmlab/mmdetection.git && cd mmdeploy
python tools/deploy.py configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py  ../mmdetection/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_fpn_1x_coco/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth ../mmdetection/demo/demo.jpg --work-dir ../work-dir/mmdet --device cuda:0 --dump-info

AllentDan · Answer 12 · Tue Apr 26 2022 12:25:55 GMT+0800 (China Standard Time)

@AllentDan python ${MMDEPLOY_DIR}/tools/deploy.py ${MMDEPLOY_DIR}/configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py ${MMDET_DIR}/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py ${CHECKPOINT_DIR}/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth ${INPUT_IMG} --work-dir ${WORK_DIR} --device cuda:0 --dump-info

When I use cuda10.2 for pytorch and mmcv, it can convert sucessfully but fail to infer. When I use cuda11.3 for pytorch and mmcv, it failed to convert. The error detail you can find above

Again with the latest mmdeploy, I also encounter the error. It might be a bug because of the latest features. Will fix it ASAP. If your are in urgency, previous version could be ok for you.

wulouzhu · Answer 13 · Tue Apr 26 2022 12:33:30 GMT+0800 (China Standard Time)

@AllentDan
What's the cuda version did you use for pytorch and mmcv?

AllentDan · Answer 14 · Tue Apr 26 2022 12:36:27 GMT+0800 (China Standard Time)

Jus the default verion in the Docker file. Is is fine to use different CUDA version under the conda env.

wulouzhu · Answer 15 · Tue Apr 26 2022 12:48:05 GMT+0800 (China Standard Time)

@AllentDan
Could you please try cuda11.3 for pytorch1.10 and mmcv in the Dockerfile? I want to know whether you encounter the error as same as mine when converting mask-rcnn model. Thanks!

ERROR: ../builder/cudnnBuilderUtils.cpp (408) - C

AllentDan · Answer 16 · Fri Apr 29 2022 13:56:01 GMT+0800 (China Standard Time)

Sorry for the late reply. I am working on it and will inform you as soon as it gets fixed.

AllentDan · Answer 17 · Fri May 06 2022 15:20:10 GMT+0800 (China Standard Time)

Hi, there. I tried to reproduce the error you encountered today and it turned out to be that we used the wrong configuration file. The expected file should be configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py for Mask-RCNN while configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py is not for instance segmentation task.

wulouzhu · Answer 18 · Fri May 06 2022 16:47:55 GMT+0800 (China Standard Time)

It's nice of you. I have solved the question. But I have another doubt for the meaning of the opt_shape in the instance-seg_tensorrt_dynamic-320x320-1344x1344.py.

AllentDan · Answer 19 · Fri May 06 2022 19:39:23 GMT+0800 (China Standard Time)

Well, the max_shape and min_shape are the settings for the input resolution range when inference.

wulouzhu · Answer 20 · Fri May 06 2022 19:49:12 GMT+0800 (China Standard Time)

Well, the max_shape and min_shape are the settings for the input resolution range when inference.

Yes, I know that, what about the opt_shape?

AllentDan · Answer 21 · Sat May 07 2022 10:58:27 GMT+0800 (China Standard Time)

Just the shape the TensorRT will optimize on.