Error when onnx2tf: Floating point exception (core dumped)

Question

Error when onnx2tf: Floating point exception (core dumped)

MuffinTopSJY opened this issue 8 months ago · comments

MuffinTopSJY commented 8 months ago

Issue Type

Others

OS

Linux

onnx2tf version number

1.19.5

onnx version number

1.15.0

onnxruntime version number

1.16.3

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.15.0

Download URL for ONNX

https://github.com/fabio-sim/LightGlue-ONNX/releases/download/v1.0.0/superpoint_lightglue_end2end_fused_cpu.onnx

Parameter Replacement JSON

Description

I convert the LightGlue model from Pytorch to Onnx and the inferences are both successful.

# image0, image1 are both: np.ndarray (1, 1, 512, 512) in [0,1], dtype=float32 
session = onnxruntime.InferenceSession("superpoint_lightglue_end2end_fused_cpu.onnx")
onnx_output = session.run(['kpts0', 'kpts1', 'matches0', 'mscores0'], {'image0':image0, 'image1':image1})
print("[ONNX] Model Outputs:", [o.name for o in session.get_outputs()])
print("[ONNX] Model Predictions:", onnx_output)

And the outputs of the Onnx model are listed below:

[ONNX] Model Outputs: ['kpts0', 'kpts1', 'matches0', 'mscores0']
[ONNX] Model Predictions: 
[array([[[214,   8],
        [326,   8],
        [424,   8],
        ...,
        [319, 501],
        [329, 503],
        [425, 503]]], dtype=int64), 
array([[[ 66,   8],
        [199,   8],
        [301,   8],
        ...,
        [474, 497],
        [500, 497],
        [278, 503]]], dtype=int64), 
array([[ 117,    5],
       [ 122,    8],
       [ 129,    2],
       ...,
       [1319, 1110],
       [1320, 1114],
       [1322, 1112]], dtype=int64), 
array([0.66974837, 0.9046258 , 0.96944857, 0.90069985, 0.94022113,
       0.9269928 , 0.9782594 , 0.18117692, 0.9418314 , 0.9777683 ,
       ... , 
       0.6197381 , 0.32578984, 0.90328395, 0.2957324 , 0.59147465,
       0.9046468 , 0.8206571 , 0.78984714, 0.8549151 , 0.75364023],
      dtype=float32)]

But when I try converting the Onnx to TFLite, this error occurs:

Floating point exception (core dumped).

I get NO MORE information but ONE LINE about this error.
'superpoint_lightglue_end2end_fused_cpu.onnx' is supported online. To avoid the influence of the version of the Onnx or else, I converted the Pytorch model to Onnx model again following your requirements, but still got the same error.
I don't know why it occurs and how to deal with it because no such error occurs when running the Onnx model.

Sincerely thank you for your time.

MuffinTopSJY · Answer 1 · Tue Jan 09 2024 15:22:52 GMT+0800 (China Standard Time)

btw, I used code below to convert onnx2tf.

onnx2tf.convert(
    input_onnx_file_path="superpoint_lightglue_end2end_fused_cpu.onnx",
    output_folder_path="model.tf",
    copy_onnx_input_output_names_to_tflite=True,
    non_verbose=True,
)

Katsuya Hyodo · Answer 2 · Tue Jan 09 2024 16:02:42 GMT+0800 (China Standard Time)

The input shape should be fixed.
e.g.

onnx2tf \
-i superpoint_lightglue_end2end_fused_cpu.onnx \
-ois image0:1,1,240,320 image1:1,1,480,640

Stop using NonZero and replace it with another OP, onnx2tf can't determine the correct channel location where NCHW should be transposed to NHWC. The number of multiple input/output elements in the OP after NonZero cannot be determined, making the conversion technically difficult.

Katsuya Hyodo · Answer 3 · Tue Jan 09 2024 16:05:49 GMT+0800 (China Standard Time)

My particular implementation without NonZero.

Fork
https://github.com/PINTO0309/LightGlue-ONNX

MuffinTopSJY · Answer 4 · Tue Jan 09 2024 16:18:53 GMT+0800 (China Standard Time)

many thanks！I'll try it again.

MuffinTopSJY · Answer 5 · Wed Jan 10 2024 13:41:36 GMT+0800 (China Standard Time)

Hi, sorry to bother you again.

Following https://github.com/PINTO0309/LightGlue-ONNX, I tried converting Lightglue to Onnx model by python export.py --img_size 512 512 --lightglue_path weights/sjy_fused_static.onnx --end2end, and the Onnx model worked in my test.

When converting by onnx2tf -i sjy_fused_static.onnx or onnx2tf -i sjy_fused_static.onnx -ois image0:1,1,512,512 image1:1,1,512,512, (in superpoint.py, I set top_num = 300) I got this error:

INFO: 1409 / 3391
INFO: onnx_op_type: Expand onnx_op_name: /lightglue/posenc/Expand
INFO:  input_name.1: /lightglue/posenc/Unsqueeze_3_output_0 shape: [2, 1, 1, 300, 32, 1] dtype: float32
INFO:  input_name.2: /lightglue/posenc/Where_output_0 shape: [6] dtype: int64
INFO:  output_name.1: /lightglue/posenc/Expand_output_0 shape: [2, 1, 1, 300, 32, 2] dtype: float32
ERROR: The trace log is below.
Traceback (most recent call last):
  File "/home/feiluo/.conda/envs/onnx2tf/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 310, in print_wrapper_func
    result = func(*args, **kwargs)
  File "/home/feiluo/.conda/envs/onnx2tf/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 383, in inverted_operation_enable_disable_wrapper_func
    result = func(*args, **kwargs)
  File "/home/feiluo/.conda/envs/onnx2tf/lib/python3.10/site-packages/onnx2tf/utils/common_functions.py", line 53, in get_replacement_parameter_wrapper_func
    func(*args, **kwargs)
  File "/home/feiluo/.conda/envs/onnx2tf/lib/python3.10/site-packages/onnx2tf/ops/Expand.py", line 118, in make_node
    expanded_tensor = input_tensor * ones
  File "/home/feiluo/.conda/envs/onnx2tf/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/feiluo/.conda/envs/onnx2tf/lib/python3.10/site-packages/keras/src/layers/core/tf_op_layer.py", line 119, in handle
    return TFOpLambda(op)(*args, **kwargs)
  File "/home/feiluo/.conda/envs/onnx2tf/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
ValueError: Exception encountered when calling layer "tf.math.multiply_46" (type TFOpLambda).

Dimensions must be equal, but are 32 and 2 for '{{node tf.math.multiply_46/Mul}} = Mul[T=DT_FLOAT](Placeholder, tf.math.multiply_46/Mul/y)' with input shapes: [1,2,1,300,32,1], [1,1,1,1,2,1].

Call arguments received by layer "tf.math.multiply_46" (type TFOpLambda):
  • x=tf.Tensor(shape=(1, 2, 1, 300, 32, 1), dtype=float32)
  • y=tf.Tensor(shape=(1, 1, 1, 1, 2, 1), dtype=float32)
  • name=None

The structure diagram of sjy_fused_static.onnx related to onnx_op_name: /lightglue/posenc/Expand might be like this:

Besides, I have doubts about these results, the output.shape of tf doesn't match the output of onnx. I don't know if it matters, for example:

INFO: 1401 / 3391
INFO: onnx_op_type: Concat onnx_op_name: /lightglue/posenc/Concat
INFO:  input_name.1: /lightglue/posenc/Unsqueeze_output_0 shape: [1, 1, 300, 32] dtype: float32
INFO:  input_name.2: /lightglue/posenc/Unsqueeze_1_output_0 shape: [1, 1, 300, 32] dtype: float32
INFO:  output_name.1: /lightglue/posenc/Concat_output_0 shape: [2, 1, 300, 32] dtype: float32
INFO: tf_op_type: concat
INFO:  input.1.input0: name: tf.reshape_36/Reshape:0 shape: (1, 1, 300, 32) dtype: <dtype: 'float32'> 
INFO:  input.2.input1: name: tf.reshape_37/Reshape:0 shape: (1, 1, 300, 32) dtype: <dtype: 'float32'> 
INFO:  input.3.axis: val: 0 
INFO:  output.1.output: name: tf.concat_29/concat:0 shape: (1, 2, 300, 32) dtype: <dtype: 'float32'> 

INFO: 1402 / 3391
INFO: onnx_op_type: Concat onnx_op_name: /lightglue/posenc_1/Concat
INFO:  input_name.1: /lightglue/posenc_1/Unsqueeze_output_0 shape: [1, 1, 300, 32] dtype: float32
INFO:  input_name.2: /lightglue/posenc_1/Unsqueeze_1_output_0 shape: [1, 1, 300, 32] dtype: float32
INFO:  output_name.1: /lightglue/posenc_1/Concat_output_0 shape: [2, 1, 300, 32] dtype: float32
INFO: tf_op_type: concat
INFO:  input.1.input0: name: tf.reshape_38/Reshape:0 shape: (1, 1, 300, 32) dtype: <dtype: 'float32'> 
INFO:  input.2.input1: name: tf.reshape_39/Reshape:0 shape: (1, 1, 300, 32) dtype: <dtype: 'float32'> 
INFO:  input.3.axis: val: 0 
INFO:  output.1.output: name: tf.concat_35/concat:0 shape: (1, 2, 300, 32) dtype: <dtype: 'float32'> 

INFO: 1405 / 3391
INFO: onnx_op_type: Unsqueeze onnx_op_name: /lightglue/posenc/Unsqueeze_3
INFO:  input_name.1: /lightglue/posenc/Concat_output_0 shape: [2, 1, 300, 32] dtype: float32
INFO:  input_name.2: 4856 shape: [2] dtype: int64
INFO:  output_name.1: /lightglue/posenc/Unsqueeze_3_output_0 shape: [2, 1, 1, 300, 32, 1] dtype: float32
INFO: tf_op_type: reshape
INFO:  input.1.tensor: name: tf.concat_29/concat:0 shape: (1, 2, 300, 32) dtype: <dtype: 'float32'> 
INFO:  input.2.shape: val: [1, 2, 1, 300, 32, 1] 
INFO:  output.1.output: name: tf.reshape_40/Reshape:0 shape: (1, 2, 1, 300, 32, 1) dtype: <dtype: 'float32'> 

INFO: 1406 / 3391
INFO: onnx_op_type: Unsqueeze onnx_op_name: /lightglue/posenc_1/Unsqueeze_3
INFO:  input_name.1: /lightglue/posenc_1/Concat_output_0 shape: [2, 1, 300, 32] dtype: float32
INFO:  input_name.2: 4856 shape: (2,) dtype: int64
INFO:  output_name.1: /lightglue/posenc_1/Unsqueeze_3_output_0 shape: [2, 1, 1, 300, 32, 1] dtype: float32
INFO: tf_op_type: reshape
INFO:  input.1.tensor: name: tf.concat_35/concat:0 shape: (1, 2, 300, 32) dtype: <dtype: 'float32'> 
INFO:  input.2.shape: val: [1, 2, 1, 300, 32, 1] 
INFO:  output.1.output: name: tf.reshape_41/Reshape:0 shape: (1, 2, 1, 300, 32, 1) dtype: <dtype: 'float32'>

I tried to figure it out but I'm sooo stupid. T_T
Sincerely thank you for your time again.

Eric Cox · Answer 6 · Fri Jan 12 2024 08:07:28 GMT+0800 (China Standard Time)

No solution to add, but confirmation that I am seeing the same issue with top_num = 256:

ValueError: Exception encountered when calling layer "tf.math.multiply_44" (type TFOpLambda).

Dimensions must be equal, but are 32 and 2 for '{{node tf.math.multiply_44/Mul}} = Mul[T=DT_FLOAT](Placeholder, tf.math.multiply_44/Mul/y)' with input shapes: [1,2,1,256,32,1], [1,1,1,1,2,1].

Call arguments received by layer "tf.math.multiply_44" (type TFOpLambda):
  • x=tf.Tensor(shape=(1, 2, 1, 256, 32, 1), dtype=float32)
  • y=tf.Tensor(shape=(1, 1, 1, 1, 2, 1), dtype=float32)
  • name=None

Really appreciate this thread. I've been trying different approaches to get LightGlue to TFLite for a while now, and this is as close as I've gotten.

MuffinTopSJY · Answer 7 · Wed Jan 17 2024 10:04:54 GMT+0800 (China Standard Time)

No solution to add, but confirmation that I am seeing the same issue with top_num = 256:
ValueError: Exception encountered when calling layer "tf.math.multiply_44" (type TFOpLambda).

Dimensions must be equal, but are 32 and 2 for '{{node tf.math.multiply_44/Mul}} = Mul[T=DT_FLOAT](Placeholder, tf.math.multiply_44/Mul/y)' with input shapes: [1,2,1,256,32,1], [1,1,1,1,2,1].

Call arguments received by layer "tf.math.multiply_44" (type TFOpLambda):
  • x=tf.Tensor(shape=(1, 2, 1, 256, 32, 1), dtype=float32)
  • y=tf.Tensor(shape=(1, 1, 1, 1, 2, 1), dtype=float32)
  • name=None
Really appreciate this thread. I've been trying different approaches to get LightGlue to TFLite for a while now, and this is as close as I've gotten.

hi sir, sry to bother you, have you solved this problem?

github-actions · Answer 8 · Mon Jan 22 2024 17:03:38 GMT+0800 (China Standard Time)

If there is no activity within the next two days, this issue will be closed automatically.

Katsuya Hyodo · Answer 9 · Tue Jan 23 2024 00:40:25 GMT+0800 (China Standard Time)

Confirmation was delayed because time was spent optimizing other models.

There are too many dimensions. Automatic conversion by onnx2tf has its limitations. The following procedure should be used to correct it to the correct dimension, but the structure is too complex to devote sufficient time to it.

https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#parameter-replacement

github-actions · Answer 10 · Sun Jan 28 2024 17:03:31 GMT+0800 (China Standard Time)

If there is no activity within the next two days, this issue will be closed automatically.

Katsuya Hyodo · Answer 11 · Sat Apr 20 2024 08:58:19 GMT+0800 (China Standard Time)

Duplicate of : #569