PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Dynamic batch / Dynamic shape] onnx model with dynamic input is converted to tflite with static input 1

mikel-brostrom opened this issue · comments

Issue Type

Others

OS

Linux

onnx2tf version number

1.15.8

onnx version number

1.13.0

onnxruntime version number

1.13.1

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.13.0

Download URL for ONNX

osnet_x0_25_msmt17.zip

Parameter Replacement JSON

NA

Description

Hi @PINTO0309!

I have the following issue

ONNX input:

Screenshot from 2023-08-04 11-52-46

TFLite (FP32 model) input:
Screenshot from 2023-08-04 11-57-24

after conversion by: onnx2tf -i examples/weights/osnet_x0_25_msmt17.onnx -o /home/mikel.brostrom/yolo_tracking/examples/weights/osnet_x0_25_msmt17_saved_model -nuo --non_verbose

I went through the README but could find any reason behind this behavior. -b 10 works as expected but my input varies depending on the image so the input needs to be dynamic. Output size is also set to the static input value.

There is no problem with the model conversion operation itself. That is the problem with Netron's graphical display feature. The evidence is presented below.

  • Step.1

    onnx2tf -i osnet_x0_25_msmt17.onnx -osd --non_verbose
    • tflite
      When viewing tflite in Netron, the batch size appears to be fixed at 1.
      image
    • saved_model
      However, checking the structure of saved_model, the batch size is correctly set to -1.
      saved_model_cli show --dir saved_model/ --all
      
      MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
      
      signature_def['__saved_model_init_op']:
        The given SavedModel SignatureDef contains the following input(s):
        The given SavedModel SignatureDef contains the following output(s):
          outputs['__saved_model_init_op'] tensor_info:
              dtype: DT_INVALID
              shape: unknown_rank
              name: NoOp
        Method name is: 
      
      signature_def['serving_default']:
        The given SavedModel SignatureDef contains the following input(s):
          inputs['images'] tensor_info:
              dtype: DT_FLOAT
              shape: (-1, 256, 128, 3)
              name: serving_default_images:0
        The given SavedModel SignatureDef contains the following output(s):
          outputs['output'] tensor_info:
              dtype: DT_FLOAT
              shape: (-1, 512)
              name: PartitionedCall:0
        Method name is: tensorflow/serving/predict
  • Step.2
    To prove that the tflite structure has been converted correctly, I will convert the tflite to JSON and look at the structure.

    docker run --rm -it \
    -v `pwd`:/home/user/workdir \
    ghcr.io/pinto0309/tflite2json2tflite:latest
    
    ./flatc -t \
    --strict-json \
    --defaults-json \
    -o workdir \
    ./schema.fbs -- workdir/saved_model/osnet_x0_25_msmt17_float32.tflite
    
    ls -l workdir
    
    -rw-rw-r-- 1 user user   921564 Aug  4 10:24 osnet_x0_25_msmt17.onnx
    -rw-r--r-- 1 user user 10369524 Aug  4 10:30 osnet_x0_25_msmt17_float32.json
    drwxrwxr-x 4 user user     4096 Aug  4 10:26 saved_model

    image

    • osnet_x0_25_msmt17_float32.json
      "shape_signature" is correctly set to -1. However, "shape" is set to 1. This could be a problem with TFLiteConverter, or it could be a problem with Netron's graphical display capabilities.
      image

In other words, although onnx2tf converts TFLiteConverer as specified, with the batch size of -1 without any model processing, only Netron's display is broken. This is a problem I have known for quite some time. However, the inference itself does not cause the problem. The strings and values ultimately written to tflite (Flatbuffers) are uncontrollable from onnx2tf.

Thank you so much for your rapid reply and your time once again 😄

Yup, I printed self.interpreter.get_input_details() and got:

[{'name': 'inputs_0', 'index': 0, 'shape': array([  1, 256, 128,   3], dtype=int32), 'shape_signature': array([ -1, 256, 128,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

I guess there is something weird going on in TFLiteConverter. Can run ONNX with dynamic inputs but the TFLite one crashes...
[ 1, 256, 128, 3] is the dummy input I use to torch.onnx.export

If you want to infer in variable batches, you need to infer using signature. In such cases, the -coion option must be specified when converting the model. Note that I have identified a problem with quantization with the -coion option, which can corrupt tflite files. #429

'shape_signature': array([ -1, 256, 128, 3], dtype=int32)
interpreter.get_signature_runner()

https://github.com/PINTO0309/onnx2tf#4-match-tflite-inputoutput-names-and-inputoutput-order-to-onnx

  • convert
    onnx2tf -i osnet_x0_25_msmt17.onnx -osd -coion --non_verbose
    image
  • test.py - Batch size: 5
    import numpy as np
    import tensorflow as tf
    from pprint import pprint
    
    interpreter = tf.lite.Interpreter(model_path="saved_model/osnet_x0_25_msmt17_float32.tflite")
    tf_lite_model = interpreter.get_signature_runner()
    inputs = {
        'images': np.ones([5,256,128,3], dtype=np.float32),
    }
    tf_lite_output = tf_lite_model(**inputs)
    print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
    print(f"[TFLite] Model Predictions:")
    pprint(tf_lite_output)
  • results
    [TFLite] Model Predictions shape: (5, 512)
    [TFLite] Model Predictions:
    {'output': array([[0.0000000e+00, 2.4730086e-04, 0.0000000e+00, ..., 1.0528549e+00,
            3.7874988e-01, 0.0000000e+00],
           [0.0000000e+00, 2.4730086e-04, 0.0000000e+00, ..., 1.0528549e+00,
            3.7874988e-01, 0.0000000e+00],
           [0.0000000e+00, 2.4730086e-04, 0.0000000e+00, ..., 1.0528549e+00,
            3.7874988e-01, 0.0000000e+00],
           [0.0000000e+00, 2.4730086e-04, 0.0000000e+00, ..., 1.0528549e+00,
            3.7874988e-01, 0.0000000e+00],
           [0.0000000e+00, 2.4730084e-04, 0.0000000e+00, ..., 1.0528525e+00,
            3.7874976e-01, 0.0000000e+00]], dtype=float32)}
    
  • test.py - Batch size: 3
    import numpy as np
    import tensorflow as tf
    from pprint import pprint
    
    interpreter = tf.lite.Interpreter(model_path="saved_model/osnet_x0_25_msmt17_float32.tflite")
    tf_lite_model = interpreter.get_signature_runner()
    inputs = {
        'images': np.ones([3,256,128,3], dtype=np.float32),
    }
    tf_lite_output = tf_lite_model(**inputs)
    print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
    print(f"[TFLite] Model Predictions:")
    pprint(tf_lite_output)
  • results
    [TFLite] Model Predictions shape: (3, 512)
    [TFLite] Model Predictions:
    {'output': array([[0.0000000e+00, 2.4730084e-04, 0.0000000e+00, ..., 1.0528525e+00,
            3.7874976e-01, 0.0000000e+00],
           [0.0000000e+00, 2.4730084e-04, 0.0000000e+00, ..., 1.0528525e+00,
            3.7874976e-01, 0.0000000e+00],
           [0.0000000e+00, 2.4730084e-04, 0.0000000e+00, ..., 1.0528525e+00,
            3.7874976e-01, 0.0000000e+00]], dtype=float32)}
    

onnx2tf -i examples/weights/osnet_x0_25_msmt17.onnx -o /home/mikel.brostrom/yolo_tracking/examples/weights/osnet_x0_25_msmt17_saved_model -osd -coion --non_verbose

works, no problem. But when I run:

import numpy as np
import tensorflow as tf
from pprint import pprint

interpreter = tf.lite.Interpreter(model_path="/home/mikel.brostrom/yolo_tracking/examples/weights/osnet_x0_25_msmt17_saved_model/osnet_x0_25_msmt17_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
    'images': np.ones([5,256,128,3], dtype=np.float32),
}
tf_lite_output = tf_lite_model(**inputs)
print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
print(f"[TFLite] Model Predictions:")
pprint(tf_lite_output)

I get:

lite/python/interpreter.py", line 853, in get_signature_runner
    raise ValueError(
ValueError: SignatureDef signature_key is None and model has 0 Signatures. None is only allowed when the model has 1 SignatureDef

Should this be added manually?

Are all necessary packages installed? flatbuffers-compiler
https://github.com/PINTO0309/onnx2tf#environment

If it doesn't work, try Docker.

docker run --rm -it \
-v `pwd`:/workdir \
-w /workdir \
docker.io/pinto0309/onnx2tf:1.15.8

Yup, installed all the packages mentioned in README (flatbuffers-compiler included). Will try docker

I get the same issue there:

user@69584e9dc119:/workdir$ python examples/weights/test.py 
Traceback (most recent call last):
  File "examples/weights/test.py", line 6, in <module>
    tf_lite_model = interpreter.get_signature_runner()
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/lite/python/interpreter.py", line 853, in get_signature_runner
    raise ValueError(
ValueError: SignatureDef signature_key is None and model has 0 Signatures. None is only allowed when the model has 1 SignatureDef
user@69584e9dc119:/workdir$

test.py contains:

import numpy as np
import tensorflow as tf
from pprint import pprint

interpreter = tf.lite.Interpreter(model_path="/workdir/examples/weights/osnet_x0_25_msmt17_saved_model/osnet_x0_25_msmt17_float32.tflite")
tf_lite_model = interpreter.get_signature_runner()
inputs = {
    'images': np.ones([5,256,128,3], dtype=np.float32),
}
tf_lite_output = tf_lite_model(**inputs)
print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
print(f"[TFLite] Model Predictions:")
pprint(tf_lite_output)

Have to catch a train. So will have to continue looking at this later today 😄

What is examples/weights/tttt.py?
Please describe the exact command you executed.

Once the conversion is performed in the Docker container, there should be no errors. Also, if you are running test.py correctly, no error can occur. Be sure to check that the file path is the correct path. The problems you have are unique to your environment.

docker run --rm -it \
-v `pwd`:/workdir \
-w /workdir \
docker.io/pinto0309/onnx2tf:1.15.8

onnx2tf \
-i osnet_x0_25_msmt17.onnx \
-o saved_model \
-osd \
-coion \
--non_verbose

The onnx2tf command is not failing. What is falling is the inference, in test.py.

I know that from the beginning.

Concerned that your host PC environment was corrupted at the time of converting the model. Please redo everything in Docker.

Thanks for your patience. Will try Docker later today from scratch :)

Everything that could go wrong went wrong 🤣. My bad with the environment. Have it working after your suggestions:

tflite model input torch.Size([1, 256, 128, 3])
tflite model output (1, 512)
0: 480x640 1 person, 9.1ms

tflite model input torch.Size([2, 256, 128, 3])
tflite model output (2, 512)
0: 480x640 1 person, 1 chair, 9.5ms

tflite model input torch.Size([2, 256, 128, 3])
tflite model output (2, 512)
0: 480x640 1 person, 1 chair, 15.5ms

Will use the provided docker from now on when doing onnx2tf stuff 😄

Glad to hear it went well.

I have added it to the README and will close it.

Great tutorial for dynamic batch inference using TFLite models! It was much needed IMO.