Resize operation fails (['unk0', 'unk1', 'unk2', 'unk3']) and raises UnboundLocalError: local variable 'new_size' referenced before assignment

Question

Resize operation fails (['unk0', 'unk1', 'unk2', 'unk3']) and raises UnboundLocalError: local variable 'new_size' referenced before assignment

CarlosNacher opened this issue 5 months ago · comments

Carlos Nácher Collado commented 5 months ago

Issue Type

Others

OS

Linux

onnx2tf version number

1.19.15

onnx version number

1.15.0

onnxruntime version number

Version: 1.17.1

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.15.0.post1

Download URL for ONNX

https://we.tl/t-HJjXDINMND

Parameter Replacement JSON

{
  "format_version": 1,
  "operations": [
    {
      "op_name": "/ae/decoder/Resize",
      "param_target": "outputs", 
      "param_name": "/ae/decoder/Resize_output_0",
      "values": [1, 64, 25, 12] 
    }
  ]
}

Description

Link to the Google Colab notebook with all the execution: https://colab.research.google.com/drive/1StDa10u2DytLO_8IivYMMUNuI_dndLpr?usp=sharing

Purpose: I am trying to convert .onnx model into .pb one because I want to embed my model in a Google Coral TPU USB Accelerator. So following: https://coral.ai/docs/edgetpu/models-intro/#compatibility-overview, the first I have to do is convert my pytorch model into onnx (done), then to tensorflow (i am stuck with it), then quantize, convert to tfilte, etc, etc
What: My model seems to be failing in a Resize operation, I am providing param_replacement JSON, but I don't know if I am not using it properly, because it seems to have no effect.
How: Yo can find the link to the entire execution pipeline in the Google Colab link at the init of this comment.
Why: because if not, I can not continue with my purpose of embedding the model into a TPU-
Resources: No additional of this repo.

Please, I am struggling a lot with this. Your tool is the most hopeful I have found after a intense research, however I don't think I fully understand how to use it and get the most out of it. If you can help me, I would really appreciate it!

Thank you so much in advance!

PS: I re-write the links of my .onnx model: https://we.tl/t-HJjXDINMND and my colab notebook: https://colab.research.google.com/drive/1StDa10u2DytLO_8IivYMMUNuI_dndLpr?usp=sharing

Carlos Nácher Collado · Answer 1 · Tue Apr 16 2024 17:31:36 GMT+0800 (China Standard Time)

Update: I think I have solved it with the -ois parameter. At first, I was trying !onnx2tf -i $PATH_TO_MY_MODEL -ois data:1,3,1024,608 because "data" is what is said to be used in the README, but I have just realized that it has to be the name of the onnx input, in my case "input", so with !onnx2tf -i $PATH_TO_MY_MODEL -ois input:1,3,1024,608 it works.

However, I still doesn`t fully understand the outputs, since I don't know wether the following is a problem that I have to try to solve and how to do it:

Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 68, Total Ops 138, % non-converted = 49.28 %
 * 68 ARITH ops

- arith.constant:   68 occurrences  (f32: 52, i32: 16)

Katsuya Hyodo · Answer 2 · Tue Apr 16 2024 18:15:09 GMT+0800 (China Standard Time)

Your last command you tried is correct. There is nothing we engineers can do about that log that TensorFlow displays for no good reason.

onnx2tf -i model.onnx -cotof -ois input:1,3,1024,608

The outputs of ONNX and TFLite agree with each other with an error of less than 1e-4.

By the way, your model cannot be converted to a model for edgetpu.

Carlos Nácher Collado · Answer 3 · Tue Apr 16 2024 20:13:53 GMT+0800 (China Standard Time)

Your last command you tried is correct. There is nothing we engineers can do about that log that TensorFlow displays for no good reason.
onnx2tf -i model.onnx -cotof -ois input:1,3,1024,608
The outputs of ONNX and TFLite agree with each other with an error of less than 1e-4.

By the way, your model cannot be converted to a model for edgetpu.

Hi, thanks for your quick response! So, is it usual to have some operations "not converted"? And, why my model cannot be converted to a model for edgetpu? :/

Katsuya Hyodo · Answer 4 · Tue Apr 16 2024 20:23:09 GMT+0800 (China Standard Time)

So, is it usual to have some operations "not converted"?

I don't know. I don't even know what "not converted" means. Ask for help on the TensorFlow forum. The internal specifications of TensorFlow are outside the scope of this repository.

why my model cannot be converted to a model for edgetpu?

Simply, the input resolution is too large. Check the TensorFlow and EdgeTPU documentation yourself.

Carlos Nácher Collado · Answer 5 · Tue Apr 16 2024 20:32:38 GMT+0800 (China Standard Time)

So, is it usual to have some operations "not converted"?

I don't know. I don't even know what "not converted" means. Ask for help on the TensorFlow forum. The internal specifications of TensorFlow are outside the scope of this repository.

why my model cannot be converted to a model for edgetpu?

Simply, the input resolution is too large. Check the TensorFlow and EdgeTPU documentation yourself.

The Edge TPU documentation about model requirements (https://coral.ai/docs/edgetpu/models-intro/#model-requirements) don't say anything about input resolution. It only restricts all operations to be INT8, to be in the list of supported operations too and it is supposed that in total all has to weigh less than 2GB (the memory of the TPU), but an image with size 3,1024,608 in INT8 only ocuppies 1.78 MiB . Can you share with me the part of the documentation where they talk about the input resolution, please?

Katsuya Hyodo · Answer 6 · Tue Apr 16 2024 20:41:40 GMT+0800 (China Standard Time)

Can you share me the part of the documentation where they talk about the input resolution, please?

I have no doubt that I have undocumented knowledge, including what is documented. I've been working on TensorFlow and EdgeTPU far longer than you all have, and I've come to that answer by actually transforming models and failing, and by looking all over the issues.

Quite simply, Google engineers do not attempt to answer questions seriously. For specifications not listed in their documentation, there is no other way but to read the source code of the TensorFlow and EdgeTPU runtimes. However, I do not remember or track the location of every single relevant source code. That's because the source code has changed significantly over the years.

Carlos Nácher Collado · Answer 7 · Tue Apr 16 2024 20:49:33 GMT+0800 (China Standard Time)

Quite simply, Google engineers do not attempt to answer questions seriously. For specifications not listed in their documentation, there is no other way but to read the source code of the TensorFlow and EdgeTPU runtimes. However, I do not remember or track the location of every single relevant source code. That's because the source code has changed significantly over the years.

Okay, so the point is that you think it will fail based on your expertise, right? I respect your expertise since you have done a great job with this library, I am not calling into question your response, I just want to understand why model could fail, if an 3x1024x608 image weighs less than 2Mb and the memory of the TPU is 2GB. I only want to understand you, since I am interested in making it work.

Thank you!

Katsuya Hyodo · Answer 8 · Tue Apr 16 2024 20:56:20 GMT+0800 (China Standard Time)

I asked the same question as you did on the tpu issue. But they do not answer.

You should first try the conversion with the input resolution reduced to 10% of the original size before asking me any questions.

I'm not a Google engineer. Other than reading Google's internal code, I can only make assumptions about everything.

I'll say it again. I don't know everything.

Carlos Nácher Collado · Answer 9 · Tue Apr 16 2024 21:07:51 GMT+0800 (China Standard Time)

Yes, I don't know why Google doesn't provide good documentation and code for TPU usage.

I have tried the command !onnx2tf -i $PATH_TO_MY_MODEL -ois input:1,3,1024,608 -coto -oiqt

But the actual error is not related to resolution:

...
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/lite/python/optimize/calibrator.py", line 101, in _feed_tensors
    for sample in dataset_gen():
  File "/usr/local/lib/python3.10/dist-packages/onnx2tf/onnx2tf.py", line 1435, in representative_dataset_gen
    yield_data_dict[model_input_name] = normalized_calib_data.astype(np.float32)
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/tensor.py", line 256, in __getattr__

AttributeError: EagerTensor object has no attribute 'astype'. 
        If you are looking for numpy-related methods, please run the following:
        tf.experimental.numpy.experimental_enable_numpy_behavior()
      . Did you mean: 'dtype'?

Katsuya Hyodo · Answer 10 · Tue Apr 16 2024 21:15:25 GMT+0800 (China Standard Time)

I don't understand why such an error would occur even though it is a numpy.ndarray variable. Try downgrading to tensorflow==2.14.0.

Katsuya Hyodo · Answer 11 · Tue Apr 16 2024 21:26:55 GMT+0800 (China Standard Time)

lol. I reproduced it. Obviously, this is a bug in TensorFlow. Wait a few hours to add the workaround fix to onnx2tf.

EagerTensor object has no attribute 'astype'. 
        If you are looking for numpy-related methods, please run the following:
        tf.experimental.numpy.experimental_enable_numpy_behavior()
      
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/python/framework/tensor.py", line 256, in __getattr__
    raise AttributeError(
  File "/home/b920405/git/onnx2tf/onnx2tf/onnx2tf.py", line 1436, in representative_dataset_gen
    yield_data_dict[model_input_name] = normalized_calib_data.astype(np.float32)
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/lite/python/optimize/calibrator.py", line 101, in _feed_tensors
    for sample in dataset_gen():
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/lite/python/optimize/calibrator.py", line 254, in calibrate
    self._feed_tensors(dataset_gen, resize_input=True)
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/lite/python/convert_phase.py", line 215, in wrapper
    raise error from None  # Re-throws the exception.
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/lite/python/convert_phase.py", line 215, in wrapper
    raise error from None  # Re-throws the exception.
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/lite/python/lite.py", line 735, in _quantize
    calibrated = calibrate_quantize.calibrate(
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/lite/python/lite.py", line 1037, in _optimize_tflite_model
    model = self._quantize(
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/lite/python/convert_phase.py", line 215, in wrapper
    raise error from None  # Re-throws the exception.
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/lite/python/convert_phase.py", line 215, in wrapper
    raise error from None  # Re-throws the exception.
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/lite/python/lite.py", line 1332, in _convert_from_saved_model
    return self._optimize_tflite_model(
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/lite/python/lite.py", line 1465, in convert
    return self._convert_from_saved_model(graph_def)
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/lite/python/lite.py", line 1093, in _convert_and_export_metrics
    result = convert_func(self, *args, **kwargs)
  File "/home/b920405/.local/lib/python3.10/site-packages/tensorflow/lite/python/lite.py", line 1139, in wrapper
    return self._convert_and_export_metrics(convert_func, *args, **kwargs)
  File "/home/b920405/git/onnx2tf/onnx2tf/onnx2tf.py", line 1449, in convert
    tflite_model = converter.convert()
  File "/home/b920405/git/onnx2tf/onnx2tf/onnx2tf.py", line 2327, in main
    model = convert(
  File "/home/b920405/git/onnx2tf/onnx2tf/onnx2tf.py", line 2381, in <module>
    main()
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
AttributeError: EagerTensor object has no attribute 'astype'. 
        If you are looking for numpy-related methods, please run the following:
        tf.experimental.numpy.experimental_enable_numpy_behavior()

Carlos Nácher Collado · Answer 12 · Tue Apr 16 2024 21:37:22 GMT+0800 (China Standard Time)

Okay, thank you so much!

By the way, have you wondered why Google / TP have not provided good documtnation / tutoriales about TPU usage? Maybe they want the users to use Google Cloud Plattform and forget about infraestructure rather than sell individual TPUs to particulars :/

Katsuya Hyodo · Answer 13 · Tue Apr 16 2024 21:41:27 GMT+0800 (China Standard Time)

They are a super talented group of engineers, but they have no interest in documentation.

Carlos Nácher Collado · Answer 14 · Tue Apr 16 2024 21:46:40 GMT+0800 (China Standard Time)

Yes but Google (not only engineers, the company in general) wants to earn money, and if people doesn't know how to use its products, they won't use it, and they won't buy it. The only reason comes t my mind is they prefer the GCP bussiness, idk

Katsuya Hyodo · Answer 15 · Tue Apr 16 2024 22:23:41 GMT+0800 (China Standard Time)

onnx2tf -i model.onnx -oiqt -qt per-tensor -ois input:1,3,1024,608

model_full_integer_quant.tflite.zip

Katsuya Hyodo · Answer 16 · Tue Apr 16 2024 22:34:57 GMT+0800 (China Standard Time)

Fixes: https://github.com/PINTO0309/onnx2tf/releases/tag/1.19.16

Carlos Nácher Collado · Answer 17 · Thu Apr 18 2024 15:42:16 GMT+0800 (China Standard Time)

Thank you so much!

I would like to take advantage to ask you a couple of questions that have come up to me about the usage of two params for achieving INT8 quantized model (I know that in EdgeTPU you can either use the fully quantized model or let the inputs/outpus in float, with the cost of conversion in each inference):

custom_input_op_name_np_data_path: In the README it is said that the .npy image is expected to be in the range [0, 1]. And you pass a mean and std that, when inside convert function are used to normalize, the input could change to any range (even negative values). Ok so far, but when model is converted to .tflite INT 8, the inputs to that model should be in the range ([0, 1] - mean) / std?
input_output_quant_dtype. The previous question is easy and is only for confirmation purposes, but this one is more tricky I think. This param could be "int8" or "uint8". But, does it means that if I pass for example "int8", the expected inputs in the fully quantized tflite model should be dtype int8? And thus, related with the custom_input_op_name_np_data_path param, in this case should I pass custom_input_op_name_np_data_path in range [0, 1] or as int8? I guess that [0, 1] and internally it maps to INT8, but then, in the inference with the fully quantized model, when model will expect INT8 / UINT8 inputs, how do I normalize my data from my original range to expected by the model replicating the same that would do internally if using the not-fully quantized model?

I hope I have expressed myself well. If not, I will try to ask in another way. If you could help me with this, I would appreciate it very much. Thank you!

Katsuya Hyodo · Answer 18 · Thu Apr 18 2024 16:12:07 GMT+0800 (China Standard Time)

This question has nothing to do with onnx2tf, it's basic quantization. Quantization using the -cind option of onnx2tf and quantization directly using TFLiteConverter are the same operation. I am not your teacher and I am not going to answer any more essentially irrelevant questions.

https://github.com/PINTO0309/onnx2tf/issues?q=is%3Aissue+is%3Aclosed+int8

The normalization method and its scope will vary depending on whether your model entrance includes a normalization layer or not, and on the content of the input data. If a normalization layer is included in the model, mean=0.0 and std=1.0 should be used.

https://www.tensorflow.org/lite/performance/post_training_quantization

https://www.tensorflow.org/lite/performance/quantization_spec

Inside onnx2tf, the tensor entered with the -cind option is not divided by 255.0. Therefore, it is only necessary to divide all data for calibration by 255.0 in advance. However, this is the case for RGB input data in the range 0-255, and for non-image data, division by 255 is wrong.

https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#9-int8-quantization-of-models-with-multiple-inputs-requiring-non-image-data

Carlos Nácher Collado · Answer 19 · Thu Apr 18 2024 18:05:17 GMT+0800 (China Standard Time)

I know, the only I want to know is, in the fully INT quantized model, how to calibrate future inference data in the same way is was done during the model exporting. But if you are not my teacher it's okay, one love!

Carlos Nácher Collado · Answer 20 · Thu Apr 18 2024 19:01:57 GMT+0800 (China Standard Time)

Okay, I know how to do it!

You have to scale with scale and zero_point params that are stored in interpreter.get_input_details()[0]["quantization"], I was forgetting the theory, now it's clear!

All the logic can be found in: https://www.tensorflow.org/lite/performance/post_training_integer_quant#run_the_tensorflow_lite_models

Resize operation fails (['unk__0', 'unk__1', 'unk__2', 'unk__3']) and raises UnboundLocalError: local variable 'new_size' referenced before assignment

Issue Type

OS

onnx2tf version number

onnx version number

onnxruntime version number

onnxsim (onnx_simplifier) version number

tensorflow version number

Download URL for ONNX

Parameter Replacement JSON

Description

Resize operation fails (['unk0', 'unk1', 'unk2', 'unk3']) and raises UnboundLocalError: local variable 'new_size' referenced before assignment