[Dynamic batch / Dynamic shape] onnx model with dynamic input is converted to tflite with static input 1

Question

[Dynamic batch / Dynamic shape] onnx model with dynamic input is converted to tflite with static input 1

mikel-brostrom opened this issue 10 months ago · comments

Mike commented 10 months ago

Issue Type

Others

OS

Linux

onnx2tf version number

1.15.8

onnx version number

1.13.0

onnxruntime version number

1.13.1

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.13.0

Download URL for ONNX

osnet_x0_25_msmt17.zip

Parameter Replacement JSON

NA

Description

Hi @PINTO0309!

I have the following issue

ONNX input:

TFLite (FP32 model) input:

after conversion by: onnx2tf -i examples/weights/osnet_x0_25_msmt17.onnx -o /home/mikel.brostrom/yolo_tracking/examples/weights/osnet_x0_25_msmt17_saved_model -nuo --non_verbose

I went through the README but could find any reason behind this behavior. -b 10 works as expected but my input varies depending on the image so the input needs to be dynamic. Output size is also set to the static input value.

Katsuya Hyodo · Answer 1 · Fri Aug 04 2023 18:42:12 GMT+0800 (China Standard Time)

There is no problem with the model conversion operation itself. That is the problem with Netron's graphical display feature. The evidence is presented below.

Step.1

onnx2tf -i osnet_x0_25_msmt17.onnx -osd --non_verbose

tflite
When viewing tflite in Netron, the batch size appears to be fixed at 1.

saved_model
However, checking the structure of saved_model, the batch size is correctly set to -1.

saved_model_cli show --dir saved_model/ --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['images'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 256, 128, 3)
        name: serving_default_images:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 512)
        name: PartitionedCall:0
  Method name is: tensorflow/serving/predict

Step.2
To prove that the tflite structure has been converted correctly, I will convert the tflite to JSON and look at the structure.

docker run --rm -it \
-v `pwd`:/home/user/workdir \
ghcr.io/pinto0309/tflite2json2tflite:latest

./flatc -t \
--strict-json \
--defaults-json \
-o workdir \
./schema.fbs -- workdir/saved_model/osnet_x0_25_msmt17_float32.tflite

ls -l workdir

-rw-rw-r-- 1 user user   921564 Aug  4 10:24 osnet_x0_25_msmt17.onnx
-rw-r--r-- 1 user user 10369524 Aug  4 10:30 osnet_x0_25_msmt17_float32.json
drwxrwxr-x 4 user user     4096 Aug  4 10:26 saved_model

osnet_x0_25_msmt17_float32.json
"shape_signature" is correctly set to -1. However, "shape" is set to 1. This could be a problem with TFLiteConverter, or it could be a problem with Netron's graphical display capabilities.

In other words, although onnx2tf converts TFLiteConverer as specified, with the batch size of -1 without any model processing, only Netron's display is broken. This is a problem I have known for quite some time. However, the inference itself does not cause the problem. The strings and values ultimately written to tflite (Flatbuffers) are uncontrollable from onnx2tf.

Mike · Answer 2 · Fri Aug 04 2023 18:44:26 GMT+0800 (China Standard Time)

Thank you so much for your rapid reply and your time once again 😄

Mike · Answer 3 · Fri Aug 04 2023 20:01:53 GMT+0800 (China Standard Time)

Yup, I printed self.interpreter.get_input_details() and got:

[{'name': 'inputs_0', 'index': 0, 'shape': array([  1, 256, 128,   3], dtype=int32), 'shape_signature': array([ -1, 256, 128,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

I guess there is something weird going on in TFLiteConverter. Can run ONNX with dynamic inputs but the TFLite one crashes...
[ 1, 256, 128, 3] is the dummy input I use to torch.onnx.export

Katsuya Hyodo · Answer 4 · Fri Aug 04 2023 20:17:27 GMT+0800 (China Standard Time)

If you want to infer in variable batches, you need to infer using signature. In such cases, the -coion option must be specified when converting the model. Note that I have identified a problem with quantization with the -coion option, which can corrupt tflite files. #429

'shape_signature': array([ -1, 256, 128, 3], dtype=int32)
interpreter.get_signature_runner()

https://github.com/PINTO0309/onnx2tf#4-match-tflite-inputoutput-names-and-inputoutput-order-to-onnx

convert

onnx2tf -i osnet_x0_25_msmt17.onnx -osd -coion --non_verbose

test.py - Batch size: 5

import numpy as np
import tensorflow as tf
from pprint import pprint

interpreter = tf.lite.Interpreter(model_path="saved_model/osnet_x0_25_msmt17_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
    'images': np.ones([5,256,128,3], dtype=np.float32),
}
tf_lite_output = tf_lite_model(**inputs)
print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
print(f"[TFLite] Model Predictions:")
pprint(tf_lite_output)

results

[TFLite] Model Predictions shape: (5, 512)
[TFLite] Model Predictions:
{'output': array([[0.0000000e+00, 2.4730086e-04, 0.0000000e+00, ..., 1.0528549e+00,
        3.7874988e-01, 0.0000000e+00],
       [0.0000000e+00, 2.4730086e-04, 0.0000000e+00, ..., 1.0528549e+00,
        3.7874988e-01, 0.0000000e+00],
       [0.0000000e+00, 2.4730086e-04, 0.0000000e+00, ..., 1.0528549e+00,
        3.7874988e-01, 0.0000000e+00],
       [0.0000000e+00, 2.4730086e-04, 0.0000000e+00, ..., 1.0528549e+00,
        3.7874988e-01, 0.0000000e+00],
       [0.0000000e+00, 2.4730084e-04, 0.0000000e+00, ..., 1.0528525e+00,
        3.7874976e-01, 0.0000000e+00]], dtype=float32)}

test.py - Batch size: 3

import numpy as np
import tensorflow as tf
from pprint import pprint

interpreter = tf.lite.Interpreter(model_path="saved_model/osnet_x0_25_msmt17_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
    'images': np.ones([3,256,128,3], dtype=np.float32),
}
tf_lite_output = tf_lite_model(**inputs)
print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
print(f"[TFLite] Model Predictions:")
pprint(tf_lite_output)

results

[TFLite] Model Predictions shape: (3, 512)
[TFLite] Model Predictions:
{'output': array([[0.0000000e+00, 2.4730084e-04, 0.0000000e+00, ..., 1.0528525e+00,
        3.7874976e-01, 0.0000000e+00],
       [0.0000000e+00, 2.4730084e-04, 0.0000000e+00, ..., 1.0528525e+00,
        3.7874976e-01, 0.0000000e+00],
       [0.0000000e+00, 2.4730084e-04, 0.0000000e+00, ..., 1.0528525e+00,
        3.7874976e-01, 0.0000000e+00]], dtype=float32)}

Mike · Answer 5 · Fri Aug 04 2023 20:52:37 GMT+0800 (China Standard Time)

onnx2tf -i examples/weights/osnet_x0_25_msmt17.onnx -o /home/mikel.brostrom/yolo_tracking/examples/weights/osnet_x0_25_msmt17_saved_model -osd -coion --non_verbose

works, no problem. But when I run:

import numpy as np
import tensorflow as tf
from pprint import pprint

interpreter = tf.lite.Interpreter(model_path="/home/mikel.brostrom/yolo_tracking/examples/weights/osnet_x0_25_msmt17_saved_model/osnet_x0_25_msmt17_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
    'images': np.ones([5,256,128,3], dtype=np.float32),
}
tf_lite_output = tf_lite_model(**inputs)
print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
print(f"[TFLite] Model Predictions:")
pprint(tf_lite_output)

I get:

lite/python/interpreter.py", line 853, in get_signature_runner
    raise ValueError(
ValueError: SignatureDef signature_key is None and model has 0 Signatures. None is only allowed when the model has 1 SignatureDef

Should this be added manually?

Katsuya Hyodo · Answer 6 · Fri Aug 04 2023 20:56:27 GMT+0800 (China Standard Time)

Are all necessary packages installed? flatbuffers-compiler
https://github.com/PINTO0309/onnx2tf#environment

If it doesn't work, try Docker.

docker run --rm -it \
-v `pwd`:/workdir \
-w /workdir \
docker.io/pinto0309/onnx2tf:1.15.8

Mike · Answer 7 · Fri Aug 04 2023 21:01:07 GMT+0800 (China Standard Time)

Yup, installed all the packages mentioned in README (flatbuffers-compiler included). Will try docker

Mike · Answer 8 · Fri Aug 04 2023 21:04:00 GMT+0800 (China Standard Time)

I get the same issue there:

user@69584e9dc119:/workdir$ python examples/weights/test.py 
Traceback (most recent call last):
  File "examples/weights/test.py", line 6, in <module>
    tf_lite_model = interpreter.get_signature_runner()
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/lite/python/interpreter.py", line 853, in get_signature_runner
    raise ValueError(
ValueError: SignatureDef signature_key is None and model has 0 Signatures. None is only allowed when the model has 1 SignatureDef
user@69584e9dc119:/workdir$

test.py contains:

import numpy as np
import tensorflow as tf
from pprint import pprint

interpreter = tf.lite.Interpreter(model_path="/workdir/examples/weights/osnet_x0_25_msmt17_saved_model/osnet_x0_25_msmt17_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
    'images': np.ones([5,256,128,3], dtype=np.float32),
}
tf_lite_output = tf_lite_model(**inputs)
print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
print(f"[TFLite] Model Predictions:")
pprint(tf_lite_output)

Mike · Answer 9 · Fri Aug 04 2023 21:05:30 GMT+0800 (China Standard Time)

Have to catch a train. So will have to continue looking at this later today 😄

Katsuya Hyodo · Answer 10 · Fri Aug 04 2023 21:12:15 GMT+0800 (China Standard Time)

What is examples/weights/tttt.py?
Please describe the exact command you executed.

Once the conversion is performed in the Docker container, there should be no errors. Also, if you are running test.py correctly, no error can occur. Be sure to check that the file path is the correct path. The problems you have are unique to your environment.

docker run --rm -it \
-v `pwd`:/workdir \
-w /workdir \
docker.io/pinto0309/onnx2tf:1.15.8

onnx2tf \
-i osnet_x0_25_msmt17.onnx \
-o saved_model \
-osd \
-coion \
--non_verbose

Mike · Answer 11 · Fri Aug 04 2023 21:26:36 GMT+0800 (China Standard Time)

The onnx2tf command is not failing. What is falling is the inference, in test.py.

Katsuya Hyodo · Answer 12 · Fri Aug 04 2023 21:42:29 GMT+0800 (China Standard Time)

I know that from the beginning.

Concerned that your host PC environment was corrupted at the time of converting the model. Please redo everything in Docker.

Mike · Answer 13 · Fri Aug 04 2023 21:58:36 GMT+0800 (China Standard Time)

Thanks for your patience. Will try Docker later today from scratch :)

Mike · Answer 14 · Sat Aug 05 2023 04:20:47 GMT+0800 (China Standard Time)

Everything that could go wrong went wrong 🤣. My bad with the environment. Have it working after your suggestions:

tflite model input torch.Size([1, 256, 128, 3])
tflite model output (1, 512)
0: 480x640 1 person, 9.1ms

tflite model input torch.Size([2, 256, 128, 3])
tflite model output (2, 512)
0: 480x640 1 person, 1 chair, 9.5ms

tflite model input torch.Size([2, 256, 128, 3])
tflite model output (2, 512)
0: 480x640 1 person, 1 chair, 15.5ms

Will use the provided docker from now on when doing onnx2tf stuff 😄

Katsuya Hyodo · Answer 15 · Sat Aug 05 2023 09:09:51 GMT+0800 (China Standard Time)

Glad to hear it went well.

Katsuya Hyodo · Answer 16 · Sun Aug 06 2023 13:46:53 GMT+0800 (China Standard Time)

I have added it to the README and will close it.

Mike · Answer 17 · Sun Aug 06 2023 19:05:45 GMT+0800 (China Standard Time)

Great tutorial for dynamic batch inference using TFLite models! It was much needed IMO.