[Bug] GatherND shape conversion from ONNX is inaccurate

Question

[Bug] GatherND shape conversion from ONNX is inaccurate

PINTO0309 opened this issue 3 years ago · comments

1. System information (version)

OpenVINO=> 2021.4.0-3839
Operating System / Platform => Ubuntu 20.04 x86_64
Compiler => GLIBC 2.31, g++ 9.3.0
Problem classification => Model Conversion
Framework: ONNX opset=12 (TensorFlow)
Model name: HITNET
ONNX, tflite, TensorFlow sample

2. Detailed description

Converting a GatherND from ONNX with batch_dims=3 set does not produce the expected output shape. Therefore, in the subsequent operation transformation operation, the dimension transformation will be invalid and the optimizer will Abort.

For example, the tool works as follows.

Op: GatherND
input[0] : shape = [ 1 64 64 320]
input[1] : shape = [ 1 64 64 1 1]
output[0]: shape = [ 4096 1]

However, the originally expected behavior is as follows.

Op: GatherND
input[0] : shape = [ 1 64 64 320]
input[1] : shape = [ 1 64 64 1 1]
output[0]: shape = [ 1 64 64 1]

Log message. (The @@@@ part was output by adding debug prints to the Model Optimizer logic by myself.)

[ 2021-09-05 17:22:08,287 ] [ DEBUG ] [ infer:116 ]  --------------------
[ 2021-09-05 17:22:08,287 ] [ DEBUG ] [ infer:117 ]  Partial infer for level0_1/level_init/GatherV2_1;level0/level_init/GatherV2_1/axis
[ 2021-09-05 17:22:08,287 ] [ DEBUG ] [ infer:118 ]  Op: GatherND
[ 2021-09-05 17:22:08,287 ] [ DEBUG ] [ infer:119 ]  Inputs:
[ 2021-09-05 17:22:08,287 ] [ DEBUG ] [ infer:19 ]  input[0]: shape = [  1  64  64 320], value = <UNKNOWN>
[ 2021-09-05 17:22:08,287 ] [ DEBUG ] [ infer:19 ]  input[1]: shape = [ 1 64 64  1  1], value = <UNKNOWN>
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:132 ]  Outputs:
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:19 ]  output[0]: shape = [4096    1], value = <UNKNOWN>
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:116 ]  --------------------
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:117 ]  Partial infer for Mul__311
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:118 ]  Op: Mul
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:119 ]  Inputs:
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:19 ]  input[0]: shape = [ 1 64 64  1], value = <UNKNOWN>
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:19 ]  input[1]: shape = [], value = 0
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:132 ]  Outputs:
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:19 ]  output[0]: shape = [ 1 64 64  1], value = <UNKNOWN>
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:116 ]  --------------------
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:117 ]  Partial infer for level0/level_init/init_to_prop/concat
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:118 ]  Op: Concat
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:119 ]  Inputs:
[ 2021-09-05 17:22:08,288 ] [ DEBUG ] [ infer:19 ]  input[0]: shape = [4096    1], value = <UNKNOWN>
[ 2021-09-05 17:22:08,289 ] [ DEBUG ] [ infer:19 ]  input[1]: shape = [ 1 64 64 48], value = <UNKNOWN>
@@@@@@@@@@@@@@@@@@@@@@@@@ shape: [4096    1]
@@@@@@@@@@@@@@@@@@@@@@@@@ not_mask: [ True False]
@@@@@@@@@@@@@@@@@@@@@@@@@ s: [ 1 64 64 48]
[ ERROR ]  Cannot infer shapes or values for node "level0/level_init/init_to_prop/concat".
[ ERROR ]  boolean index did not match indexed array along dimension 0; dimension is 4 but corresponding boolean dimension is 2
[ ERROR ]  
[ ERROR ]  It can happen due to bug in custom shape infer function <function concat_infer at 0x7fa5c296daf0>.
[ ERROR ]  Or because the node inputs have incorrect values/shapes.
[ ERROR ]  Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ 2021-09-05 17:22:08,289 ] [ DEBUG ] [ infer:184 ]  Node "level0/level_init/init_to_prop/concat" attributes: {'pb': input: "level0/level_init/GatherV2_1;level0/level_init/GatherV2_1/axis"
input: "fe_shared_3"
output: "level0/level_init/init_to_prop/concat"
name: "level0/level_init/init_to_prop/concat"
op_type: "Concat"
attribute {
  name: "axis"
  i: -1
  type: INT
}
, 'kind': 'op', '_in_ports': {1: {'control_flow': False}, 0: {'control_flow': False}}, '_out_ports': {0: {'control_flow': False}}, 'name': 'level0/level_init/init_to_prop/concat', 'op': 'Concat', 'type': 'Concat', 'version': 'opset1', 'axis': 1, 'infer': <function concat_infer at 0x7fa5c296daf0>, 'out_ports_count': 1, 'dim_attrs': ['batch_dims', 'axis', 'spatial_dims', 'channel_dims'], 'shape_attrs': ['stride', 'shape', 'pad', 'output_shape', 'window'], 'IE': [('layer', [('id', <function Op.substitute_ie_attrs.<locals>.<lambda> at 0x7fa5b96fd430>), 'name', 'type', 'version'], [('data', ['axis'], []), '@ports', '@consts'])], 'is_output_reachable': True, 'is_undead': False, 'is_const_producer': False, 'is_partial_inferred': False}
[ ERROR ]  Exception occurred during running replacer "REPLACEMENT_ID" (<class 'extensions.middle.PartialInfer.PartialInfer'>): Stopped shape/value propagation at "level0/level_init/init_to_prop/concat" node. 
 For more information please refer to Model Optimizer FAQ, question #38. (https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html?question=38#question-38)
[ 2021-09-05 17:22:08,290 ] [ DEBUG ] [ main:410 ]  Traceback (most recent call last):
  File "/opt/intel/openvino_2021/deployment_tools/model_optimizer/mo/middle/passes/infer.py", line 122, in partial_infer
    node.infer(node)
  File "/opt/intel/openvino_2021/deployment_tools/model_optimizer/mo/front/common/partial_infer/concat.py", line 45, in concat_infer
    if np.all(shape[not_mask] == s[not_mask]):  # TODO handle -1 in a special way
IndexError: boolean index did not match indexed array along dimension 0; dimension is 4 but corresponding boolean dimension is 2

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/intel/openvino_2021/deployment_tools/model_optimizer/mo/utils/class_registration.py", line 276, in apply_transform
    replacer.find_and_replace_pattern(graph)
  File "/opt/intel/openvino_2021/deployment_tools/model_optimizer/extensions/middle/PartialInfer.py", line 21, in find_and_replace_pattern
    partial_infer(graph)
  File "/opt/intel/openvino_2021/deployment_tools/model_optimizer/mo/middle/passes/infer.py", line 185, in partial_infer
    raise Error('Stopped shape/value propagation at "{}" node. '.format(node.soft_get('name')) +
mo.utils.error.Error: Stopped shape/value propagation at "level0/level_init/init_to_prop/concat" node. 
 For more information please refer to Model Optimizer FAQ, question #38. (https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html?question=38#question-38)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/intel/openvino_2021.4.582/deployment_tools/model_optimizer/mo/main.py", line 394, in main
    ret_code = driver(argv)
  File "/opt/intel/openvino_2021.4.582/deployment_tools/model_optimizer/mo/main.py", line 356, in driver
    ret_res = emit_ir(prepare_ir(argv), argv)
  File "/opt/intel/openvino_2021.4.582/deployment_tools/model_optimizer/mo/main.py", line 252, in prepare_ir
    graph = unified_pipeline(argv)
  File "/opt/intel/openvino_2021/deployment_tools/model_optimizer/mo/pipeline/unified.py", line 13, in unified_pipeline
    class_registration.apply_replacements(graph, [
  File "/opt/intel/openvino_2021/deployment_tools/model_optimizer/mo/utils/class_registration.py", line 328, in apply_replacements
    apply_replacements_list(graph, replacers_order)
  File "/opt/intel/openvino_2021/deployment_tools/model_optimizer/mo/utils/class_registration.py", line 314, in apply_replacements_list
    apply_transform(
  File "/opt/intel/openvino_2021/deployment_tools/model_optimizer/mo/utils/logger.py", line 111, in wrapper
    function(*args, **kwargs)
  File "/opt/intel/openvino_2021/deployment_tools/model_optimizer/mo/utils/class_registration.py", line 294, in apply_transform
    raise Error('Exception occurred during running replacer "{}" ({}): {}'.format(
mo.utils.error.Error: Exception occurred during running replacer "REPLACEMENT_ID" (<class 'extensions.middle.PartialInfer.PartialInfer'>): Stopped shape/value propagation at "level0/level_init/init_to_prop/concat" node. 
 For more information please refer to Model Optimizer FAQ, question #38. (https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html?question=38#question-38)

The operational part of ONNX where the problem occurs is shown in the figure below.

Official GatherND documentation
https://docs.openvinotoolkit.org/latest/openvino_docs_ops_movement_GatherND_5.html

3. Steps to reproduce

Download model_float32_opt.onnx.zip
Unzip the ZIP.
Convert with the following command.

${INTEL_OPENVINO_DIR}/deployment_tools/model_optimizer/mo.py \
--input_model model_float32_opt.onnx \
--data_type FP32 \
--output_dir openvino/FP32 \
--log_level=DEBUG

option: Original HITNET model URL

4. Issue submission checklist

I report the issue, it's not a question
I checked the problem with documentation, FAQ, open issues, Stack Overflow, etc and have not found solution
There is reproducer code and related data files: images, videos, models, etc.

Iffa_Intel · Answer 1 · Tue Sep 07 2021 15:21:44 GMT+0800 (China Standard Time)

Hi,

OpenVINO does support varieties of models and topologies. However, it is limited.
The supported topologies are listed here.

If you noticed, your model topology is not listed in that documentation. Hence, it is not supported and issues are expected.

Katsuya Hyodo · Answer 2 · Tue Sep 07 2021 15:38:37 GMT+0800 (China Standard Time)

Thank you.
I know that it is not listed in the topology. And, It is within my expectation that an error will occur. Please don't shift the point.

Even if it's not a "bug," the behavior is clearly strange. I just wanted to report that the behavior is not correct.

Iffa_Intel · Answer 3 · Wed Sep 08 2021 10:39:58 GMT+0800 (China Standard Time)

Not sure if you had noticed this or not, you will need to use the --input_shape parameter.
It is a parameter that should be fed to an input node(s)of the model.

Its shape is defined as a comma-separated list of integer numbers enclosed in parentheses or square brackets, for example [1,3,227,227] or (1,227,227,3)

where the order of dimensions depends on the framework input layout of the model. For example, [N,C,H,W] is used for Caffe* models and [N,H,W,C] for TensorFlow* models.

Generally, Model Optimizer performs necessary transformations to convert the shape to the layout required by Inference Engine(N,C,H,W).

The shape should not contain undefined dimensions (? or -1) and should fit the dimensions defined in the input operation of the graph.

Notes:

Katsuya Hyodo · Answer 4 · Wed Sep 08 2021 16:52:17 GMT+0800 (China Standard Time)

Generally, Model Optimizer performs necessary transformations to convert the shape to the layout required by Inference Engine(N,C,H,W).

Yes. There are no undefined dimensions. Also, the model is NCHW.

Jesus Espinoza · Answer 5 · Sat Sep 11 2021 07:12:03 GMT+0800 (China Standard Time)

Hi @PINTO0309

Could you double check the level0/level_init/GatherV2_1;level0/level_init/GatherV2_1/axis operation in your ONNX model? I'm trying to run your ONNX model directly with the benchmark_app and am seeing the following error.

python "C:\Program Files (x86)\Intel\openvino_2021\deployment_tools\tools\benchmark_tool\benchmark_app.py" -m model_float32_opt.onnx
RuntimeError: While validating ONNX node '<Node(Concat): level0/level_init/init_to_prop/concat>':
Check 'PartialShape::merge_into(inputs_shape_scheme, this_input_shape)' failed at C:\j\workspace\private-ci\ie\build-windows-vs2019\b\repos\openvino\ngraph\core\src\op\concat.cpp:86:
While validating node 'v0::Concat Concat_1106 (level0/level_init/GatherV2_1;level0/level_init/GatherV2_1/axis[0]:f32{4096,1}, fe_shared_31[0]:f32{1,64,64,48}) -> (dynamic?)' with friendly_name 'Concat_1106':
Argument shapes are inconsistent; they must have the same rank, and must have equal dimension everywhere except on the concatenation axis (axis 1).

Regards,
Jesus

Katsuya Hyodo · Answer 6 · Mon Sep 13 2021 13:42:00 GMT+0800 (China Standard Time)

@jgespino Thank you for your reply. 😄

I tried inference using ONNX alone without benchmark_app to isolate whether it was a problem with the structure of the model, the conversion tool, or the benchmark tool. The source code used for the test and the results of the inference are shown in the figure below. The model works fine when using ONNX runtime, but gives an error when using the OpenVINO toolkit.

https://github.com/ibaiGorordo/ONNX-HITNET-Stereo-Depth-estimation

I issued an issue because there does not seem to be any problem with the structure of the model itself. However, I am not familiar with how the internal workings of OpenVINO work, so I don't know what else I can investigate and provide you guys with information. The program to check the operation of ONNX is @ibaiGorordo created, but I generated the ONNX model and gave it to him.

Although not the entire program, below is a portion of the program for inputting two still images into onnx runtime and getting the depth estimation results.

import cv2
from hitnet import HitNet, ModelType, draw_disparity, draw_depth, CameraConfig, load_img
import numpy as np
from imread_from_url import imread_from_url

if __name__ == '__main__':
		
	# Select model type
	# model_type = ModelType.middlebury
	# model_type = ModelType.flyingthings
	model_type = ModelType.eth3d

	if model_type == ModelType.middlebury:
		model_path = "models/middlebury_d400/saved_model_480x640/model_float32.onnx"
	elif model_type == ModelType.flyingthings:
		model_path = "models/flyingthings_finalpass_xl/saved_model_480x640/model_float32.onnx"
	elif model_type == ModelType.eth3d:
		model_path = "models/eth3d/saved_model_480x640/model_float32.onnx"

	# Initialize model
	hitnet_depth = HitNet(model_path, model_type)

	# Load images
	left_img = imread_from_url("https://vision.middlebury.edu/stereo/data/scenes2003/newdata/cones/im2.png")
	right_img = imread_from_url("https://vision.middlebury.edu/stereo/data/scenes2003/newdata/cones/im6.png")

	# Estimate the depth
	disparity_map = hitnet_depth(left_img, right_img)

	color_disparity = draw_disparity(disparity_map)
	color_disparity = cv2.resize(color_disparity, (left_img.shape[1],left_img.shape[0]))

	cobined_image = np.hstack((left_img, right_img, color_disparity))

	cv2.imwrite("out.jpg", cobined_image)

	cv2.namedWindow("Estimated disparity", cv2.WINDOW_NORMAL)	
	cv2.imshow("Estimated disparity", cobined_image)
	cv2.waitKey(0)

	cv2.destroyAllWindows()

Jesus Espinoza · Answer 7 · Wed Sep 22 2021 06:53:28 GMT+0800 (China Standard Time)

@PINTO0309 Let me check with the development team for additional insight.

Regards,
Jesus

Ref. 65974

Jesus Espinoza · Answer 8 · Wed Jan 05 2022 01:17:19 GMT+0800 (China Standard Time)

@PINTO0309

Apologies for the delay, the development team has implemented a new GatherND_8 operation and should be available in the next release. You could also try building from master branch if you need it sooner.

#7743

Regards,
Jesus