open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework

Home Page:https://mmdeploy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] [v0.14.0] Cannot relay 'img_metas' through torch.git.trace

RiverLight4 opened this issue · comments

Checklist

  • I have searched related issues but cannot get the expected help.
  • 2. I have read the FAQ documentation but cannot get the expected help.
  • 3. The bug has not been fixed in the latest version.

Describe the bug

Hello,

I'd like to use the model which is trained with RefineMask ( https://github.com/zhanggang001/RefineMask ) in ONNX. So I implemented RefineMask into MMDetection v2.28.2 and tried with MMDeploy v0.14.0, and tried converting the model to torchscript at first, but conversion is failed.
By tracing MMDetection/MMDeploy code, I found that 'img_metas' is not forwarded when model is called by torch.jit.trace when converting.
How could I give both 'img' and 'img_metas' through torch.jit.trace ?
['img_metas'][0]['ori_shape'] and ['img_metas'][0]['scale_factor'] are used in the model of RefineMask head, it is necessary to relay img_metas through the model.

info

  • Because RefineMask is implemented on MMDetection v2.3.0 now, I think it is much difficult to implement into MMDetection v3.x, so I tried with v2.28.2. If it is easy to convert models with v3.x, I'll use it.
  • If new python (like 3.12) is recommended, I can use it. The only reason is that RefineMask is tested officially by Python 3.7.

Required inputs:

Data generated with LoadImageFromFile like:

{'img_metas': [[{'filename': '/workspaces/rm-test/mmdetection-2.28.2/demo/demo.jpg', 'ori_filename': '/workspaces/rm-test/mmdetection-2.28.2/demo/demo.jpg', 'ori_shape': (427, 640, 3), 'img_shape': tensor([ 800, 1216]), 'pad_shape': (800, 1216, 3), 'scale_factor': array([1.8734375, 1.8735363, 1.8734375, 1.8735363], dtype=float32), 'flip': False, 'flip_direction': None, 'img_norm_cfg': {'mean': array([123.675, 116.28 , 103.53 ], dtype=float32), 'std': array([58.395, 57.12 , 57.375], dtype=float32), 'to_rgb': True}}]], 
 'img': [tensor([[[[-1.8439, -1.7925, -1.7240,  ...,  0.0000,  0.0000,  0.0000],
          [-1.7754, -1.7412, -1.7069,  ...,  0.0000,  0.0000,  0.0000],
          [-1.6384, -1.6555, -1.6898,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-1.0733, -1.1247, -1.2103,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0390, -0.9877, -0.9020,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0219, -0.9192, -0.7308,  ...,  0.0000,  0.0000,  0.0000]],

         [[-1.1429, -1.1078, -1.0553,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0903, -1.0728, -1.0378,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0028, -1.0028, -1.0028,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-0.8627, -0.8978, -0.9503,  ...,  0.0000,  0.0000,  0.0000],
          [-0.8277, -0.7577, -0.6176,  ...,  0.0000,  0.0000,  0.0000],
          [-0.8102, -0.6877, -0.4426,  ...,  0.0000,  0.0000,  0.0000]],

         [[-1.6476, -1.6127, -1.5256,  ...,  0.0000,  0.0000,  0.0000],
          [-1.5604, -1.5430, -1.4907,  ...,  0.0000,  0.0000,  0.0000],
          [-1.4210, -1.4210, -1.4384,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-1.1073, -1.1770, -1.3339,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0898, -1.0550, -1.0027,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0724, -0.9853, -0.8110,  ...,  0.0000,  0.0000,  0.0000]]]],
       device='cuda:0')]}

e.g. input directly into torch.jit.trace

using mmdeploy-0.14.0/tools/deploy.py

at mmdeploy-0.14.0/mmdeploy/apis/torch_jit/

    with RewriterContext(**context_info), torch.no_grad():
        # for exporting models with weight that depends on inputs
        func(*inputs) if isinstance(inputs, Sequence) \
            else func(return_loss=False, rescale=True, **inputs) # This works correct, **kwargs relays correctly and img_metas has ['ori_shape'] etc. during processing.

        ts_model = torch.jit.trace(
            func,
            inputs,                 # I'd like to input both "inputs['img']" and "inputs['img_metas']" as torch.jit.trace conbined input, but it is interpreted as tuple ((input No.1=img) (input No.2=img_metas))
            check_trace=check_trace,
            check_tolerance=check_tolerance,
            strict=False
        )

Reproduction

  1. git clone https://github.com/zhanggang001/RefineMask.git
  2. git clone https://github.com/open-mmlab/mmdetection.git
  3. cd mmdetection & git checkout v2.28.2 & cd ..
  4. git clone https://github.com/open-mmlab/mmdeploy.git
  5. cd mmdeploy & git checkout v0.14.0 & cd ..
  6. implement Refinemask difference into MMDetection v2.28.2 (see below)
  • add:
configs/refinemask/
mmdet/models/roi_heads/mask_heads/refine_mask_head.py
mmdet/utils/lvis_v0_5_categories.py
mmdet/utils/lvis_v1_0_categories.py
scripts/
tools/boundary_f1_score.py
tools/cocofied_lvis.py
tools/format_result.py
tools/lvis_filename_to2017.py
mmdet/datasets/cityscapes.py
mmdet/datasets/coco.py
mmdet/datasets/lvis.py
mmdet/models/losses/cross_entropy_loss.py
mmdet/models/roi_heads/__init__.py
mmdet/models/roi_heads/mask_heads/__init__.py
mmdet/builder.py
  1. Edit mmdeploy/apis/pytorch2torchscript.py
    torch_model = task_processor.init_pytorch_model(model_checkpoint)
    model_data, model_inputs = task_processor.create_input(img, input_shape) #FIXED
    
    if not isinstance(model_inputs, torch.Tensor):
        model_inputs = model_inputs[0]

    context_info = dict(deploy_cfg=deploy_cfg)
    backend = get_backend(deploy_cfg).value
    output_prefix = osp.join(work_dir, osp.splitext(save_file)[0])

    with no_mp():
        trace( #FIXED
            torch_model,
            model_data,
#            model_inputs, #ORIGINAL
            output_path_prefix=output_prefix,
            backend=backend,
            context_info=context_info,
            check_trace=False)
  1. Edit mmdeploy/apis/torch_jit/trace.py if changing torch.jit.trace input

  2. run model conversion with:

python mmdeploy-0.14.0/tools/deploy.py \
    mmdeploy-0.14.0/configs/mmdet/instance-seg/instance-seg_torchscript.py \
    .not_on_git/RefineMask/r50-refinemask-1x_fix2.py \
    .not_on_git/RefineMask/r50-coco-1x.pth \
    mmdetection-2.28.2/demo/demo.jpg \
    --work-dir mpc-work/refinemask_model_conv \
    --device cuda \
    --dump-info

Environment

/workspaces/rm-test/.venv/lib/python3.7/site-packages/mmcv/__init__.py:21: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  'On January 1, 2023, MMCV will release v2.0.0, in which it will remove '
sys.platform: linux
Python: 3.7.17 (default, Mar 14 2024, 18:58:10) [GCC 12.2.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 2070 Super with Max-Q Design
CUDA_HOME: None
GCC: gcc (Debian 12.2.0-14) 12.2.0
PyTorch: 1.13.1+cu117
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.14.1+cu117
OpenCV: 4.9.0
MMCV: 1.7.2
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.7
MMDetection: 2.28.2+a9aa27e

Error traceback

Process Process-2:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/pytorch2torchscript_edit.py", line 65, in torch2torchscript_edit
    check_trace=False)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/torch_jit/trace_edit.py", line 138, in trace_edit
    strict=False
  File "/workspaces/rm-test/.venv/lib/python3.7/site-packages/torch/jit/_trace.py", line 768, in trace
    _module_class,
  File "/workspaces/rm-test/.venv/lib/python3.7/site-packages/torch/jit/_trace.py", line 983, in trace_module
    argument_names,
RuntimeError: Tracer cannot infer type of ({'img_metas': [[{'filename': '/workspaces/rm-test/mmdetection-2.28.2/demo/demo.jpg', 'ori_filename': '/workspaces/rm-test/mmdetection-2.28.2/demo/demo.jpg', 'ori_shape': (427, 640, 3), 'img_shape': tensor([ 800, 1216]), 'pad_shape': (800, 1216, 3), 'scale_factor': array([1.8734375, 1.8735363, 1.8734375, 1.8735363], dtype=float32), 'flip': False, 'flip_direction': None, 'img_norm_cfg': {'mean': array([123.675, 116.28 , 103.53 ], dtype=float32), 'std': array([58.395, 57.12 , 57.375], dtype=float32), 'to_rgb': True}}]], 'img': [tensor([[[[-1.8439, -1.7925, -1.7240,  ...,  0.0000,  0.0000,  0.0000],
          [-1.7754, -1.7412, -1.7069,  ...,  0.0000,  0.0000,  0.0000],
          [-1.6384, -1.6555, -1.6898,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-1.0733, -1.1247, -1.2103,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0390, -0.9877, -0.9020,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0219, -0.9192, -0.7308,  ...,  0.0000,  0.0000,  0.0000]],

         [[-1.1429, -1.1078, -1.0553,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0903, -1.0728, -1.0378,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0028, -1.0028, -1.0028,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-0.8627, -0.8978, -0.9503,  ...,  0.0000,  0.0000,  0.0000],
          [-0.8277, -0.7577, -0.6176,  ...,  0.0000,  0.0000,  0.0000],
          [-0.8102, -0.6877, -0.4426,  ...,  0.0000,  0.0000,  0.0000]],

         [[-1.6476, -1.6127, -1.5256,  ...,  0.0000,  0.0000,  0.0000],
          [-1.5604, -1.5430, -1.4907,  ...,  0.0000,  0.0000,  0.0000],
          [-1.4210, -1.4210, -1.4384,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-1.1073, -1.1770, -1.3339,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0898, -1.0550, -1.0027,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0724, -0.9853, -0.8110,  ...,  0.0000,  0.0000,  0.0000]]]],
       device='cuda:0')]},)
:Could not infer type of list element: Could not infer type of list element: Dictionary inputs to traced functions must have consistent type. Found str and Tuple[int, int, int]
2024-03-22 13:48:18,142 - mmdeploy - ERROR - `mmdeploy.apis.pytorch2torchscript_edit.torch2torchscript_edit` with Call id: 0 failed. exit.
[03/22 13:48:18] mmdeploy ERROR: `mmdeploy.apis.pytorch2torchscript_edit.torch2torchscript_edit` with Call id: 0 failed. exit.

If I input img only:

    with RewriterContext(**context_info), torch.no_grad():
        # for exporting models with weight that depends on inputs
        func(*inputs) if isinstance(inputs, Sequence) \
            else func(return_loss=False, rescale=True, **inputs) # This works correct, **kwargs relays correctly and img_metas has ['ori_shape'] etc. during processing.

        ts_model = torch.jit.trace(
            func,
            inputs['img'][0],    # only input img, not "inputs['img_metas']" as torch.jit.trace input.
            check_trace=check_trace,
            check_tolerance=check_tolerance,
            strict=False
        )

results:

Process Process-2:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/pytorch2torchscript_edit.py", line 65, in torch2torchscript_edit
    check_trace=False)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/torch_jit/trace_edit.py", line 137, in trace_edit
    strict=False
  File "/workspaces/rm-test/.venv/lib/python3.7/site-packages/torch/jit/_trace.py", line 768, in trace
    _module_class,
  File "/workspaces/rm-test/.venv/lib/python3.7/site-packages/torch/jit/_trace.py", line 983, in trace_module
    argument_names,
  File "/workspaces/rm-test/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspaces/rm-test/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/core/rewriters/rewriter_utils.py", line 402, in wrapper
    return self.func(self, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/codebase/mmdet/models/detectors/base.py", line 70, in base_detector__forward
    return __forward_impl(ctx, self, img, img_metas=img_metas, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/core/optimizers/function_marker.py", line 261, in g
    rets = f(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/codebase/mmdet/models/detectors/base.py", line 26, in __forward_impl
    return self.simple_test(img, img_metas, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/core/rewriters/rewriter_utils.py", line 402, in wrapper
    return self.func(self, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/codebase/mmdet/models/detectors/two_stage.py", line 59, in two_stage_detector__simple_test
    return self.roi_head.simple_test(x, proposals, img_metas, rescale=True)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/core/rewriters/rewriter_utils.py", line 402, in wrapper
    return self.func(self, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/codebase/mmdet/models/roi_heads/standard_roi_head.py", line 59, in standard_roi_head__simple_test
    x, img_metas, det_bboxes, det_labels, rescale=True)
  File "/workspaces/rm-test/mmdetection-2.28.2/mmdet/models/roi_heads/refine_roi_head.py", line 86, in simple_test_mask
    ori_shape = img_metas[0]['ori_shape']

Error model code:
at mmdetection-2.28.2/mmdet/models/roi_heads/refine_roi_head.py (added by RefineMask)

    def simple_test_mask(self, x, img_metas, det_bboxes, det_labels, rescale=False):
        """Simple test for mask head without augmentation."""
        ### TODO: WORKAROUND: Only 1st image is processed. It should be fixed to multiple image inputs

        ori_shape = img_metas[0]['ori_shape']           ## ERROR AT HERE: img_metas does not have ['ori_shape'] because I couldn't input img_metas
        scale_factor = img_metas[0]['scale_factor']     ## This statemante is also failed
        if det_bboxes[0].shape[0] == 0:
            segm_result = [[] for _ in range(self.mask_head.stage_num_classes[0])]
        else: