[Bug] [v0.14.0] Cannot relay 'img_metas' through torch.git.trace

Question

[Bug] [v0.14.0] Cannot relay 'img_metas' through torch.git.trace

RiverLight4 opened this issue 2 months ago · comments

RiverLight4 commented 2 months ago

Checklist

I have searched related issues but cannot get the expected help.
2. I have read the FAQ documentation but cannot get the expected help.
3. The bug has not been fixed in the latest version.

Describe the bug

Hello,

I'd like to use the model which is trained with RefineMask ( https://github.com/zhanggang001/RefineMask ) in ONNX. So I implemented RefineMask into MMDetection v2.28.2 and tried with MMDeploy v0.14.0, and tried converting the model to torchscript at first, but conversion is failed.
By tracing MMDetection/MMDeploy code, I found that 'img_metas' is not forwarded when model is called by torch.jit.trace when converting.
How could I give both 'img' and 'img_metas' through torch.jit.trace ?
['img_metas'][0]['ori_shape'] and ['img_metas'][0]['scale_factor'] are used in the model of RefineMask head, it is necessary to relay img_metas through the model.

info

Because RefineMask is implemented on MMDetection v2.3.0 now, I think it is much difficult to implement into MMDetection v3.x, so I tried with v2.28.2. If it is easy to convert models with v3.x, I'll use it.
If new python (like 3.12) is recommended, I can use it. The only reason is that RefineMask is tested officially by Python 3.7.

Required inputs:

Data generated with LoadImageFromFile like:

{'img_metas': [[{'filename': '/workspaces/rm-test/mmdetection-2.28.2/demo/demo.jpg', 'ori_filename': '/workspaces/rm-test/mmdetection-2.28.2/demo/demo.jpg', 'ori_shape': (427, 640, 3), 'img_shape': tensor([ 800, 1216]), 'pad_shape': (800, 1216, 3), 'scale_factor': array([1.8734375, 1.8735363, 1.8734375, 1.8735363], dtype=float32), 'flip': False, 'flip_direction': None, 'img_norm_cfg': {'mean': array([123.675, 116.28 , 103.53 ], dtype=float32), 'std': array([58.395, 57.12 , 57.375], dtype=float32), 'to_rgb': True}}]], 
 'img': [tensor([[[[-1.8439, -1.7925, -1.7240,  ...,  0.0000,  0.0000,  0.0000],
          [-1.7754, -1.7412, -1.7069,  ...,  0.0000,  0.0000,  0.0000],
          [-1.6384, -1.6555, -1.6898,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-1.0733, -1.1247, -1.2103,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0390, -0.9877, -0.9020,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0219, -0.9192, -0.7308,  ...,  0.0000,  0.0000,  0.0000]],

         [[-1.1429, -1.1078, -1.0553,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0903, -1.0728, -1.0378,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0028, -1.0028, -1.0028,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-0.8627, -0.8978, -0.9503,  ...,  0.0000,  0.0000,  0.0000],
          [-0.8277, -0.7577, -0.6176,  ...,  0.0000,  0.0000,  0.0000],
          [-0.8102, -0.6877, -0.4426,  ...,  0.0000,  0.0000,  0.0000]],

         [[-1.6476, -1.6127, -1.5256,  ...,  0.0000,  0.0000,  0.0000],
          [-1.5604, -1.5430, -1.4907,  ...,  0.0000,  0.0000,  0.0000],
          [-1.4210, -1.4210, -1.4384,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-1.1073, -1.1770, -1.3339,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0898, -1.0550, -1.0027,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0724, -0.9853, -0.8110,  ...,  0.0000,  0.0000,  0.0000]]]],
       device='cuda:0')]}

e.g. input directly into `torch.jit.trace`

using mmdeploy-0.14.0/tools/deploy.py

at mmdeploy-0.14.0/mmdeploy/apis/torch_jit/

    with RewriterContext(**context_info), torch.no_grad():
        # for exporting models with weight that depends on inputs
        func(*inputs) if isinstance(inputs, Sequence) \
            else func(return_loss=False, rescale=True, **inputs) # This works correct, **kwargs relays correctly and img_metas has ['ori_shape'] etc. during processing.

        ts_model = torch.jit.trace(
            func,
            inputs,                 # I'd like to input both "inputs['img']" and "inputs['img_metas']" as torch.jit.trace conbined input, but it is interpreted as tuple ((input No.1=img) (input No.2=img_metas))
            check_trace=check_trace,
            check_tolerance=check_tolerance,
            strict=False
        )

Reproduction

git clone https://github.com/zhanggang001/RefineMask.git
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection & git checkout v2.28.2 & cd ..
git clone https://github.com/open-mmlab/mmdeploy.git
cd mmdeploy & git checkout v0.14.0 & cd ..
implement Refinemask difference into MMDetection v2.28.2 (see below)

add:

configs/refinemask/
mmdet/models/roi_heads/mask_heads/refine_mask_head.py
mmdet/utils/lvis_v0_5_categories.py
mmdet/utils/lvis_v1_0_categories.py
scripts/
tools/boundary_f1_score.py
tools/cocofied_lvis.py
tools/format_result.py
tools/lvis_filename_to2017.py

add & edit:
mmdet/models/roi_heads/refine_roi_head.py
see zhanggang001/RefineMask#38
edit:

mmdet/datasets/cityscapes.py
mmdet/datasets/coco.py
mmdet/datasets/lvis.py
mmdet/models/losses/cross_entropy_loss.py
mmdet/models/roi_heads/__init__.py
mmdet/models/roi_heads/mask_heads/__init__.py
mmdet/builder.py

Edit mmdeploy/apis/pytorch2torchscript.py

    torch_model = task_processor.init_pytorch_model(model_checkpoint)
    model_data, model_inputs = task_processor.create_input(img, input_shape) #FIXED
    
    if not isinstance(model_inputs, torch.Tensor):
        model_inputs = model_inputs[0]

    context_info = dict(deploy_cfg=deploy_cfg)
    backend = get_backend(deploy_cfg).value
    output_prefix = osp.join(work_dir, osp.splitext(save_file)[0])

    with no_mp():
        trace( #FIXED
            torch_model,
            model_data,
#            model_inputs, #ORIGINAL
            output_path_prefix=output_prefix,
            backend=backend,
            context_info=context_info,
            check_trace=False)

Edit mmdeploy/apis/torch_jit/trace.py if changing torch.jit.trace input
run model conversion with:

python mmdeploy-0.14.0/tools/deploy.py \
    mmdeploy-0.14.0/configs/mmdet/instance-seg/instance-seg_torchscript.py \
    .not_on_git/RefineMask/r50-refinemask-1x_fix2.py \
    .not_on_git/RefineMask/r50-coco-1x.pth \
    mmdetection-2.28.2/demo/demo.jpg \
    --work-dir mpc-work/refinemask_model_conv \
    --device cuda \
    --dump-info

Environment

/workspaces/rm-test/.venv/lib/python3.7/site-packages/mmcv/__init__.py:21: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  'On January 1, 2023, MMCV will release v2.0.0, in which it will remove '
sys.platform: linux
Python: 3.7.17 (default, Mar 14 2024, 18:58:10) [GCC 12.2.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 2070 Super with Max-Q Design
CUDA_HOME: None
GCC: gcc (Debian 12.2.0-14) 12.2.0
PyTorch: 1.13.1+cu117
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.14.1+cu117
OpenCV: 4.9.0
MMCV: 1.7.2
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.7
MMDetection: 2.28.2+a9aa27e

Error traceback

Process Process-2:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/pytorch2torchscript_edit.py", line 65, in torch2torchscript_edit
    check_trace=False)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/torch_jit/trace_edit.py", line 138, in trace_edit
    strict=False
  File "/workspaces/rm-test/.venv/lib/python3.7/site-packages/torch/jit/_trace.py", line 768, in trace
    _module_class,
  File "/workspaces/rm-test/.venv/lib/python3.7/site-packages/torch/jit/_trace.py", line 983, in trace_module
    argument_names,
RuntimeError: Tracer cannot infer type of ({'img_metas': [[{'filename': '/workspaces/rm-test/mmdetection-2.28.2/demo/demo.jpg', 'ori_filename': '/workspaces/rm-test/mmdetection-2.28.2/demo/demo.jpg', 'ori_shape': (427, 640, 3), 'img_shape': tensor([ 800, 1216]), 'pad_shape': (800, 1216, 3), 'scale_factor': array([1.8734375, 1.8735363, 1.8734375, 1.8735363], dtype=float32), 'flip': False, 'flip_direction': None, 'img_norm_cfg': {'mean': array([123.675, 116.28 , 103.53 ], dtype=float32), 'std': array([58.395, 57.12 , 57.375], dtype=float32), 'to_rgb': True}}]], 'img': [tensor([[[[-1.8439, -1.7925, -1.7240,  ...,  0.0000,  0.0000,  0.0000],
          [-1.7754, -1.7412, -1.7069,  ...,  0.0000,  0.0000,  0.0000],
          [-1.6384, -1.6555, -1.6898,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-1.0733, -1.1247, -1.2103,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0390, -0.9877, -0.9020,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0219, -0.9192, -0.7308,  ...,  0.0000,  0.0000,  0.0000]],

         [[-1.1429, -1.1078, -1.0553,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0903, -1.0728, -1.0378,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0028, -1.0028, -1.0028,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-0.8627, -0.8978, -0.9503,  ...,  0.0000,  0.0000,  0.0000],
          [-0.8277, -0.7577, -0.6176,  ...,  0.0000,  0.0000,  0.0000],
          [-0.8102, -0.6877, -0.4426,  ...,  0.0000,  0.0000,  0.0000]],

         [[-1.6476, -1.6127, -1.5256,  ...,  0.0000,  0.0000,  0.0000],
          [-1.5604, -1.5430, -1.4907,  ...,  0.0000,  0.0000,  0.0000],
          [-1.4210, -1.4210, -1.4384,  ...,  0.0000,  0.0000,  0.0000],
          ...,
          [-1.1073, -1.1770, -1.3339,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0898, -1.0550, -1.0027,  ...,  0.0000,  0.0000,  0.0000],
          [-1.0724, -0.9853, -0.8110,  ...,  0.0000,  0.0000,  0.0000]]]],
       device='cuda:0')]},)
:Could not infer type of list element: Could not infer type of list element: Dictionary inputs to traced functions must have consistent type. Found str and Tuple[int, int, int]
2024-03-22 13:48:18,142 - mmdeploy - ERROR - `mmdeploy.apis.pytorch2torchscript_edit.torch2torchscript_edit` with Call id: 0 failed. exit.
[03/22 13:48:18] mmdeploy ERROR: `mmdeploy.apis.pytorch2torchscript_edit.torch2torchscript_edit` with Call id: 0 failed. exit.

RiverLight4 · Answer 1 · Fri Mar 22 2024 16:12:10 GMT+0800 (China Standard Time)

If I input img only:

    with RewriterContext(**context_info), torch.no_grad():
        # for exporting models with weight that depends on inputs
        func(*inputs) if isinstance(inputs, Sequence) \
            else func(return_loss=False, rescale=True, **inputs) # This works correct, **kwargs relays correctly and img_metas has ['ori_shape'] etc. during processing.

        ts_model = torch.jit.trace(
            func,
            inputs['img'][0],    # only input img, not "inputs['img_metas']" as torch.jit.trace input.
            check_trace=check_trace,
            check_tolerance=check_tolerance,
            strict=False
        )

results:

Process Process-2:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/pytorch2torchscript_edit.py", line 65, in torch2torchscript_edit
    check_trace=False)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/apis/torch_jit/trace_edit.py", line 137, in trace_edit
    strict=False
  File "/workspaces/rm-test/.venv/lib/python3.7/site-packages/torch/jit/_trace.py", line 768, in trace
    _module_class,
  File "/workspaces/rm-test/.venv/lib/python3.7/site-packages/torch/jit/_trace.py", line 983, in trace_module
    argument_names,
  File "/workspaces/rm-test/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspaces/rm-test/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/core/rewriters/rewriter_utils.py", line 402, in wrapper
    return self.func(self, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/codebase/mmdet/models/detectors/base.py", line 70, in base_detector__forward
    return __forward_impl(ctx, self, img, img_metas=img_metas, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/core/optimizers/function_marker.py", line 261, in g
    rets = f(*args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/codebase/mmdet/models/detectors/base.py", line 26, in __forward_impl
    return self.simple_test(img, img_metas, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/core/rewriters/rewriter_utils.py", line 402, in wrapper
    return self.func(self, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/codebase/mmdet/models/detectors/two_stage.py", line 59, in two_stage_detector__simple_test
    return self.roi_head.simple_test(x, proposals, img_metas, rescale=True)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/core/rewriters/rewriter_utils.py", line 402, in wrapper
    return self.func(self, *args, **kwargs)
  File "/workspaces/rm-test/mmdeploy-0.14.0/mmdeploy/codebase/mmdet/models/roi_heads/standard_roi_head.py", line 59, in standard_roi_head__simple_test
    x, img_metas, det_bboxes, det_labels, rescale=True)
  File "/workspaces/rm-test/mmdetection-2.28.2/mmdet/models/roi_heads/refine_roi_head.py", line 86, in simple_test_mask
    ori_shape = img_metas[0]['ori_shape']

Error model code:
at mmdetection-2.28.2/mmdet/models/roi_heads/refine_roi_head.py (added by RefineMask)

    def simple_test_mask(self, x, img_metas, det_bboxes, det_labels, rescale=False):
        """Simple test for mask head without augmentation."""
        ### TODO: WORKAROUND: Only 1st image is processed. It should be fixed to multiple image inputs

        ori_shape = img_metas[0]['ori_shape']           ## ERROR AT HERE: img_metas does not have ['ori_shape'] because I couldn't input img_metas
        scale_factor = img_metas[0]['scale_factor']     ## This statemante is also failed
        if det_bboxes[0].shape[0] == 0:
            segm_result = [[] for _ in range(self.mask_head.stage_num_classes[0])]
        else:

[Bug] [v0.14.0] Cannot relay 'img_metas' through torch.git.trace

Checklist

Describe the bug

info

Required inputs:

e.g. input directly into torch.jit.trace

Reproduction

Environment

Error traceback

e.g. input directly into `torch.jit.trace`