[Bug] Serialization assertion magicTagRead == magicTag failed and Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

Question

[Bug] Serialization assertion magicTagRead == magicTag failed and Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

vxgu86 opened this issue 5 months ago · comments

Checklist

I have searched related issues but cannot get the expected help.
2. I have read the FAQ documentation but cannot get the expected help.
3. The bug has not been fixed in the latest version.

Describe the bug

I ran the deploy.py script to convert rmdet detect model to onnx and engine with tensorrt 8.6.1.6,
then test the model dir on jetson xavier with tensorrt 8.2.1.8,
ran
object_detection cuda mmdep_engine_onpc/ visible.JPG

but get error like this

[2024-01-12 12:53:52.351] [mmdeploy] [info] [model.cpp:35] [DirectoryModel] Load model: "/home/vxking/mmdeploy_ws/work_dir/mmdep_engine_onpc/" [2024-01-12 12:53:54.250] [mmdeploy] [error] [trt_net.cpp:28] TRTNet: 1: [stdArchiveReader.cpp::StdArchiveReader::30] Error Code 1: Serialization (Serialization assertion magicTagRead == magicTag failed.Magic tag does not match) [2024-01-12 12:53:54.251] [mmdeploy] [error] [trt_net.cpp:28] TRTNet: 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.) [2024-01-12 12:53:54.251] [mmdeploy] [error] [trt_net.cpp:75] failed to deserialize TRT CUDA engine [2024-01-12 12:53:54.252] [mmdeploy] [error] [net_module.cpp:54] Failed to create Net backend: tensorrt, config: { "context": { "device": "<any>", "model": "<any>", "stream": "<any>" }, "input": [ "prep_output" ], "input_map": { "img": "input" }, "is_batched": true, "module": "Net", "name": "rtmdet", "output": [ "infer_output" ], "output_map": {}, "type": "Task" } [2024-01-12 12:53:54.252] [mmdeploy] [error] [task.cpp:99] error parsing config: { "context": { "device": "<any>", "model": "<any>", "stream": "<any>" }, "input": [ "prep_output" ], "input_map": { "img": "input" }, "is_batched": true, "module": "Net", "name": "rtmdet", "output": [ "infer_output" ], "output_map": {}, "type": "Task" } Segmentation fault (core dumped)

I also tried the onnx model got on PC with :

trtexec --onnx=work_dir/end2end.onnx --saveEngine=work_dir/end.engine
get errors like this:
No importer registered for op: TRTBatchedNMS. Attempting to import as plugin.
[01/12/2024-13:06:26] [I] [TRT] Searching for plugin: TRTBatchedNMS, plugin_version: 1, plugin_namespace:
[01/12/2024-13:06:26] [E] [TRT] ModelImporter.cpp:773: While parsing node number 626 [TRTBatchedNMS -> "/TRTBatchedNMS_output_0"]:
[01/12/2024-13:06:26] [E] [TRT] ModelImporter.cpp:774: --- Begin node ---
[01/12/2024-13:06:26] [E] [TRT] ModelImporter.cpp:775: input: "/Unsqueeze_11_output_0"
input: "/Sigmoid_output_0"
output: "/TRTBatchedNMS_output_0"
output: "/TRTBatchedNMS_output_1"
name: "/TRTBatchedNMS"
op_type: "TRTBatchedNMS"
attribute {
name: "background_label_id"
i: -1
type: INT
}
attribute {
name: "clip_boxes"
i: 0
type: INT
}
attribute {
name: "iou_threshold"
f: 0.65
type: FLOAT
}
attribute {
name: "is_normalized"
i: 0
type: INT
}
attribute {
name: "keep_topk"
i: 200
type: INT
}
attribute {
name: "num_classes"
i: 24
type: INT
}
attribute {
name: "return_index"
i: 0
type: INT
}
attribute {
name: "score_threshold"
f: 0.001
type: FLOAT
}
attribute {
name: "topk"
i: 5000
type: INT
}
domain: "mmdeploy"

[01/12/2024-13:06:26] [E] [TRT] ModelImporter.cpp:776: --- End node ---
[01/12/2024-13:06:26] [E] [TRT] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:4870 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[01/12/2024-13:06:26] [E] Failed to parse onnx file
[01/12/2024-13:06:26] [I] Finish parsing network model
[01/12/2024-13:06:26] [E] Parsing model failed
[01/12/2024-13:06:26] [E] Failed to create engine from model.
[01/12/2024-13:06:26] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201] #

I also tried

python tools/torch2onnx.py
configs/mmdet/detection/detection_tensorrt-fp16_dynamic-320x320-1344x1344.py
$PATH_TO_MMDET/configs/rtmdet/rtmdet_l_8xb32-300e_coco.py
/home/mmdeploy_ws/work_dir/epoch_300.pth
/home/mmdeploy_ws/aa.JPG
--work-dir work_dir_torch2onnx
--device cuda:0
--log-level INFO

get the warning like this:

01/12 18:12:28 - mmengine - INFO - torch2onnx:
model_cfg: /home/mmdeploy_ws/mmdetection/configs/rtmdet/rtmdet_l_8xb32-300e_coco.py
deploy_cfg: configs/mmdet/detection/detection_tensorrt-fp16_dynamic-320x320-1344x1344.py
01/12 18:12:30 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
01/12 18:12:30 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
Loads checkpoint by local backend from path: /home/mmdeploy_ws/work_dir/epoch_300.pth
01/12 18:13:01 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future.
01/12 18:13:01 - mmengine - INFO - Export PyTorch model to ONNX: work_dir_torch2onnx/end2end.onnx.
01/12 18:13:01 - mmengine - WARNING - Can not find torch.nn.functional.scaled_dot_product_attention, function rewrite will not be applied
01/12 18:13:01 - mmengine - WARNING - Can not find torch._C._jit_pass_onnx_autograd_function_process, function rewrite will not be applied
01/12 18:13:01 - mmengine - WARNING - Can not find torch._C._jit_pass_onnx_deduplicate_initializers, function rewrite will not be applied
01/12 18:13:01 - mmengine - WARNING - Can not find mmdet.models.utils.transformer.PatchMerging.forward, function rewrite will not be applied
/home/mmdeploy_ws/mmdeploy/mmdeploy/core/optimizers/function_marker.py:160: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
ys_shape = tuple(int(s) for s in ys.shape)
/home/vxking/archiconda3/envs/mmdeployy/lib/python3.6/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /media/nvidia/NVME/pytorch/pytorch-v1.10.0/aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/home/mmdeploy_ws/mmdeploy/mmdeploy/mmcv/ops/nms.py:475: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
int(scores.shape[-1]),
/home/vxking/mmdeploy_ws/mmdeploy/mmdeploy/mmcv/ops/nms.py:149: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
out_boxes = min(num_boxes, after_topk)
/home/vxking/mmdeploy_ws/mmdeploy/mmdeploy/mmcv/ops/nms.py:152: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
(batch_size, out_boxes)).to(scores.device))
WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
01/12 18:14:09 - mmengine - INFO - Execute onnx optimize passes.
01/12 18:14:12 - mmengine - INFO - torch2onnx finished. Results saved to work_dir_torch2onnx

is there something wrong with the env , and the mmdeploy::TRTBatchedNMS, I have ran the procedure of trt operator installation.

Reproduction

build/bin/object_detection cuda /home/vxking/mmdeploy_ws/work_dir/mmdep_engine_onpc/ /home/mmdeploy_ws/aa.JPG

/usr/src/tensorrt/bin/trtexec --onnx=/home/mmdeploy_ws/work_dir/mmdep_engine_onpc/end2end.onnx --saveEngine=/home/mmdeploy_ws/work_dir/end.engine

python tools/torch2onnx.py
configs/mmdet/detection/detection_tensorrt-fp16_dynamic-320x320-1344x1344.py
$PATH_TO_MMDET/configs/rtmdet/rtmdet_l_8xb32-300e_coco.py
/home/mmdeploy_ws/work_dir/epoch_300.pth
/home/mmdeploy_ws/aa.JPG
--work-dir work_dir_torch2onnx
--device cuda:0
--log-level INFO

is there something wrong with the mmdeploy::TRTBatchedNMS, I have ran the procedure of trt operator installation.

Environment

01/12 18:18:32 - mmengine - INFO - 

01/12 18:18:32 - mmengine - INFO - **********Environmental information**********
/bin/sh: 1: aarch64-conda_cos7-linux-gnu-gcc: not found
01/12 18:18:35 - mmengine - INFO - sys.platform: linux
01/12 18:18:35 - mmengine - INFO - Python: 3.6.7 | packaged by conda-forge | (default, Feb 24 2019, 02:17:42) [GCC 7.3.0]
01/12 18:18:35 - mmengine - INFO - CUDA available: True
01/12 18:18:35 - mmengine - INFO - numpy_random_seed: 2147483648
01/12 18:18:35 - mmengine - INFO - GPU 0: Xavier
01/12 18:18:35 - mmengine - INFO - CUDA_HOME: /usr/local/cuda-10.2
01/12 18:18:35 - mmengine - INFO - NVCC: Cuda compilation tools, release 10.2, V10.2.300
01/12 18:18:35 - mmengine - INFO - GCC: n/a
01/12 18:18:35 - mmengine - INFO - PyTorch: 1.10.0
01/12 18:18:35 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.5
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_62,code=sm_62;-gencode;arch=compute_72,code=sm_72
  - CuDNN 8.2.1
    - Built with CuDNN 8.0
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=8.0.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=open, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=0, USE_NNPACK=ON, USE_OPENMP=ON, 

01/12 18:18:35 - mmengine - INFO - TorchVision: 0.11.1
01/12 18:18:35 - mmengine - INFO - OpenCV: 4.9.0
01/12 18:18:35 - mmengine - INFO - MMEngine: 0.9.0
01/12 18:18:35 - mmengine - INFO - MMCV: 2.0.0
01/12 18:18:35 - mmengine - INFO - MMCV Compiler: GCC 7.5
01/12 18:18:35 - mmengine - INFO - MMCV CUDA Compiler: 10.2
01/12 18:18:35 - mmengine - INFO - MMDeploy: 1.3.0+8b19586
01/12 18:18:35 - mmengine - INFO - 

01/12 18:18:35 - mmengine - INFO - **********Backend information**********
01/12 18:18:35 - mmengine - INFO - tensorrt:	8.2.1.8
01/12 18:18:35 - mmengine - INFO - tensorrt custom ops:	Available
01/12 18:18:35 - mmengine - INFO - ONNXRuntime:	None
01/12 18:18:35 - mmengine - INFO - ONNXRuntime-gpu:	1.10.0
01/12 18:18:35 - mmengine - INFO - ONNXRuntime custom ops:	NotAvailable
01/12 18:18:35 - mmengine - INFO - pplnn:	None
01/12 18:18:35 - mmengine - INFO - ncnn:	None
01/12 18:18:35 - mmengine - INFO - snpe:	None
01/12 18:18:35 - mmengine - INFO - openvino:	None
01/12 18:18:35 - mmengine - INFO - torchscript:	1.10.0
01/12 18:18:35 - mmengine - INFO - torchscript custom ops:	NotAvailable
01/12 18:18:35 - mmengine - INFO - rknn-toolkit:	None
01/12 18:18:35 - mmengine - INFO - rknn-toolkit2:	None
01/12 18:18:35 - mmengine - INFO - ascend:	None
01/12 18:18:35 - mmengine - INFO - coreml:	None
01/12 18:18:35 - mmengine - INFO - tvm:	None
01/12 18:18:35 - mmengine - INFO - vacc:	None
01/12 18:18:35 - mmengine - INFO - 

01/12 18:18:35 - mmengine - INFO - **********Codebase information**********
01/12 18:18:35 - mmengine - INFO - mmdet:	3.2.0
01/12 18:18:35 - mmengine - INFO - mmseg:	None
01/12 18:18:35 - mmengine - INFO - mmpretrain:	None
01/12 18:18:35 - mmengine - INFO - mmocr:	None
01/12 18:18:35 - mmengine - INFO - mmagic:	None
01/12 18:18:35 - mmengine - INFO - mmdet3d:	None
01/12 18:18:35 - mmengine - INFO - mmpose:	None
01/12 18:18:35 - mmengine - INFO - mmrotate:	None
01/12 18:18:35 - mmengine - INFO - mmaction:	None
01/12 18:18:35 - mmengine - INFO - mmrazor:	None
01/12 18:18:35 - mmengine - INFO - mmyolo:	None

Error traceback

build/bin/object_detection cuda /home/vxking/mmdeploy_ws/work_dir/mmdep_engine_onpc/  /home/mmdeploy_ws/aa.JPG

/usr/src/tensorrt/bin/trtexec --onnx=/home/mmdeploy_ws/work_dir/mmdep_engine_onpc/end2end.onnx --saveEngine=/home/mmdeploy_ws/work_dir/end.engine

python tools/torch2onnx.py \
    configs/mmdet/detection/detection_tensorrt-fp16_dynamic-320x320-1344x1344.py \
    $PATH_TO_MMDET/configs/rtmdet/rtmdet_l_8xb32-300e_coco.py \
    /home/mmdeploy_ws/work_dir/epoch_300.pth \
    /home/mmdeploy_ws/aa.JPG\
    --work-dir work_dir_torch2onnx \
    --device cuda:0 \
    --log-level INFO

is there something wrong with the mmdeploy::TRTBatchedNMS, I have ran the procedure of trt operator installation.