OnnxSlim support

Question

OnnxSlim support

inisis opened this issue 3 months ago · comments

Feature request

Hi, we have developed a tool called onnxslim, which can help slim exported onnx model.

pip install onnxslim

# bash
onnxslim raw_onnx_model slimmed_onnx_model --skip_fusion_patterns FusionGelu  # low onnxruntime version may not support Gelu.

# python
import onnx
from onnxslim import slim

onnx_model = "your_onnx_model.onnx"
slimmed_model = slim(onnx_model)
onnx.save(slimmed_model, "slimmed_onnx_model.onnx")

Motivation

I want to slim onnx model so we can achieve better performance, I have tested provided cases, and after onnxslim, we can achieve about 3% performance gain.

import time
import requests
from PIL import Image
from optimum.onnxruntime import ORTModelForImageClassification
from transformers import AutoFeatureExtractor

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

preprocessor = AutoFeatureExtractor.from_pretrained("optimum/vit-base-patch16-224")
model = ORTModelForImageClassification.from_pretrained("optimum/vit-base-patch16-224")
inputs = preprocessor(images=image, return_tensors="pt")

warmup_runs = 5
actual_runs = 100

for _ in range(warmup_runs):
    outputs = model(**inputs)

# Actual timing phase
start_time = time.time()
for _ in range(actual_runs):
    outputs = model(**inputs)
end_time = time.time()

# Calculate average time per run
total_time = end_time - start_time
average_time_per_run = total_time / actual_runs
print("Average time per run: {:.6f} seconds".format(average_time_per_run))

logits = outputs.logits

with slimmed model: Average time per run: 0.246237 seconds
without slimmed model: Average time per run: 0.253707 seconds

Your contribution

I can submit a pr, and help slim existed onnx models.