daquexian / onnx-simplifier

Simplify your onnx model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Excessive bloating of ONNX files due to over-efficient conversion of "Tile" to constants (Protocol Buffers .onnx > 2GB)

PINTO0309 opened this issue · comments

1. Description

The tool's level of optimization of the model structure is very high, and in most situations its high optimization is effective. However, in some patterns, optimization may result in the final size of the model becoming bloated, exceeding Protocol Buffers' upper file size limit of 2GB, and optimization may fail. The situation is as follows.

  1. ONNX with Tile OP in use
  2. Contains a combination of Tile and GatherElements generated from PyTorch's torch.meshgrid
  3. Models that generate a large number of patches of images
  4. Tile OP is optimized and a constant tensor is stored inside the model with INT64

The above is a pattern that is often reproduced by symptoms. Specifically, an 8MB ONNX file may exceed 2GB and Abort when optimized.

In the pre-optimized model, we found the following two Tile OPs where the problem occurred. These two Tiles generate Float32 input values and INT64 indicies to be passed to the next connected GatherElements, respectively.
image

This figure shows how my original workaround avoided generating a large number of INT64 constants, but if I used onnx-simplifier without taking any action, the constants at the points marked with arrows would generate 2.0 GB of constant values. Therefore, the file limit size of 2 GB for Protocol Buffers is exceeded, and onnx aborts.
image

2. My workaround

Before optimizing the model using onnx-simplifier, I extrapolated an operation to downcast from INT64 to INT32 just before the Tile OP using a utility for model processing that is implemented in onnx by default. This measure was unavoidably implemented knowing that it is not the best way, since the current onnx-simplifier does not provide any kind of optional flag to disable the full constantization of the Tile OP. However, it works well.
image

import numpy as np
import onnx
import onnx_graphsurgeon as gs

MODEL='hitnet_xl_sf_finalpass_from_tf_720x1280.onnx'

graph = gs.import_onnx(onnx.load(MODEL))

for graph_node in graph.nodes:
    if graph_node.name == 'Expand_653':
        """
        graph_node.o()

        Tile_654 (Tile)
            Inputs: [
                Variable (896): (shape=None, dtype=None)
                Variable (893): (shape=None, dtype=None)
            ]
            Outputs: [
                Variable (897): (shape=None, dtype=None)
            ]
        """
        cast_out = gs.Variable("cast_out", dtype=np.int32)
        cast_node = gs.Node(op="Cast", inputs=graph_node.outputs, outputs=[cast_out])
        cast_node.attrs["to"] = onnx.TensorProto.INT32
        graph.nodes.append(cast_node)

        graph_node.o().inputs[0] = cast_node.outputs[0]
        break

graph.cleanup().toposort()
new_graph = gs.export_onnx(graph)
infered_graph = onnx.shape_inference.infer_shapes(new_graph)
onnx.save(infered_graph, f"{MODEL.split('.')[0]}_cast.onnx")

3. Feature Request

Therefore, it would be great if you could add an option to reduce the overall model size in order to apply the advanced optimization of the onnx-simplifier to more types of models. In particular, I would be very happy to add a flag to disable the constantization of Tile OPs, as mentioned in the above issue, and an option to specify downcasting of INT64 to INT32 for some OPs.

I know that some OPs do not accept any input other than INT64, but I am convinced that this tool would be even better if, with the exception of those OPs, it were possible to use as constants types with the smallest possible arithmetic precision, such as Float32 or INT32, while checking for overflow due to downcasting.

I personally investigated the logic of onnx-simplifier, onnx-optimizer and onnx in order to issue a pull request, but it was very difficult to understand because there were so many different things to investigate.

4. Remarks

I have begun building my own model compression tool to test this concept. I believe there are still many bugs due to insufficient validation. It has only been two days since I started making them. This tool is intended to further compress the overall size of the model after it has been optimized with onnx-simplifier. However, I originally wanted to incorporate this behavior as part of the internal behavior of onnx-simplifier.

"A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible. Added option to downcast from Float64 to Float32 and INT64 to INT32 to attempt size compression. Simple Constant value Shrink for ONNX. "
https://github.com/PINTO0309/scs4onnx

5. Sample Model

  1. Model before optimization with onnx-simplifier (8.4MB)
    hitnet_xl_sf_finalpass_from_tf_720x1280.onnx.zip
  2. Model regenerated by downcasting INT64 to INT32 just before Tile OP (8.4MB)
    hitnet_xl_sf_finalpass_from_tf_720x1280_cast.onnx.zip
  3. Model that avoids the problem of exceeding the Protocol Buffers file size limit of 2GB by casting to INT32 just before Tile OP (1.2GB)
    hitnet_xl_sf_finalpass_from_tf_720x1280_cast_opt.onnx.zip

Many thanks for the excellent analysis! It is absolutely a problem.

In particular, I would be very happy to add a flag to disable the constantization of Tile OPs.

I can add such a flag soon.

I disabled the constantization of Tile OPs internally (instead of providing a flag). Could you please try the latest 0.3.9 version?

@daquexian
Thank you for your quick response!

Yes, the model size bloat has indeed been resolved. However, the structure of the model appears to be in much the same state as before optimization. Have you added an implementation that minimizes structural optimization when Tile is included?

I have not yet done enough testing on other models, but I imagine that when Tile is not included, it will still be the ultimate optimization.

hitnet_xl_sf_finalpass_from_tf_720x1280_disabel_tile_opt.onnx.zip

$ onnxsim \
hitnet_xl_sf_finalpass_from_tf_720x1280.onnx \
hitnet_xl_sf_finalpass_from_tf_720x1280_disabel_tile_opt.onnx

Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃                 ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Abs             │ 4              │ 4                │
│ Add             │ 40             │ 39               │
│ ArgMin          │ 1              │ 1                │
│ Cast            │ 92             │ 9                │
│ Clip            │ 7              │ 6                │
│ Concat          │ 61             │ 39               │
│ Constant        │ 274            │ 0                │
│ ConstantOfShape │ 16             │ 16               │
│ Conv            │ 103            │ 103              │
│ ConvTranspose   │ 8              │ 8                │
│ Div             │ 24             │ 6                │
│ Expand          │ 5              │ 5                │
│ Floor           │ 3              │ 3                │
│ Gather          │ 55             │ 14               │
│ GatherElements  │ 7              │ 7                │
│ LeakyRelu       │ 107            │ 107              │
│ Mul             │ 20             │ 19               │
│ Pad             │ 11             │ 11               │
│ Range           │ 3              │ 1                │
│ ReduceL1        │ 1              │ 1                │
│ ReduceMin       │ 1              │ 1                │
│ ReduceSum       │ 4              │ 4                │
│ Reshape         │ 38             │ 36               │
│ Resize          │ 2              │ 2                │
│ Shape           │ 61             │ 18               │
│ Slice           │ 26             │ 26               │
│ Sub             │ 68             │ 15               │
│ Tile            │ 3              │ 3                │
│ Transpose       │ 15             │ 15               │
│ Unsqueeze       │ 112            │ 32               │
│ Model Size      │ 8.0MiB         │ 8.7MiB           │
└─────────────────┴────────────────┴──────────────────┘

I see that ConstantOfShape is no longer fused. No major problems, though. It appears that the shape estimation for most of the OP is failing due to the fact that ConstantOfShape is no longer fused.
image

@PINTO0309 Thanks for your try.

It appears that the shape estimation for most of the OP is failing due to the fact that ConstantOfShape is no longer fused.

Will the problem be alleviated if onnxsim fuses all ConstantOfShape whose size < 1M?

@daquexian

Will the problem be alleviated if onnxsim fuses all ConstantOfShape whose size < 1M?

Yes, I am very happy. This is very meaningful from a different perspective than from the perspective of reducing the overall size of the model.

The fact that the shape of the model is estimated is very effective when converted to TensorRT or other frameworks. Other frameworks may not be able to successfully incorporate when there are OPs that are failing in shape estimation. For example, in the figure below, you can see that the OP shape in the middle of the model is blank. In my experience, the probability of error in other frameworks is very high, especially if the shape is not fixed in Reshape OP.
image

Yes, I am very happy. This is very meaningful from a different perspective than from the perspective of reducing the overall size of the model.

:) I will do it tomorrow (if I have some spare time)

Yes, I am very happy. This is very meaningful from a different perspective than from the perspective of reducing the overall size of the model.

I temporarily enable the constant folding of ConstantOfShape in version 0.3.10. It is a bit hard in current onnxsim to fold only ConstantOfShape whose shape is smaller than some threshold. I'll implement it in the next version -- 0.4.0.

Thank you! I will close this issue once this discussion is different from the original issue.

I appreciate your efforts very much and use onnx-simplifier every day. 😃

Amazing! v0.3.10
image
image

┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃                 ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Abs             │ 4              │ 4                │
│ Add             │ 40             │ 39               │
│ ArgMin          │ 1              │ 1                │
│ Cast            │ 92             │ 7                │
│ Clip            │ 7              │ 6                │
│ Concat          │ 61             │ 17               │
│ Constant        │ 274            │ 0                │
│ ConstantOfShape │ 16             │ 0                │
│ Conv            │ 103            │ 103              │
│ ConvTranspose   │ 8              │ 8                │
│ Div             │ 24             │ 0                │
│ Expand          │ 5              │ 2                │
│ Floor           │ 3              │ 3                │
│ Gather          │ 55             │ 0                │
│ GatherElements  │ 7              │ 7                │
│ LeakyRelu       │ 107            │ 107              │
│ Mul             │ 20             │ 11               │
│ Pad             │ 11             │ 0                │
│ Range           │ 3              │ 0                │
│ ReduceL1        │ 1              │ 1                │
│ ReduceMin       │ 1              │ 1                │
│ ReduceSum       │ 4              │ 4                │
│ Reshape         │ 38             │ 8                │
│ Resize          │ 2              │ 2                │
│ Shape           │ 61             │ 0                │
│ Slice           │ 26             │ 15               │
│ Sub             │ 68             │ 13               │
│ Tile            │ 3              │ 3                │
│ Transpose       │ 15             │ 4                │
│ Unsqueeze       │ 112            │ 2                │
│ Model Size      │ 8.0MiB         │ 9.2MiB           │
└─────────────────┴────────────────┴──────────────────┘

@PINTO0309 Thanks! I'm very happy that you like it. :)