OnnxSlim support
inisis opened this issue · comments
inisis commented
Feature request
Hi, we have developed a tool called onnxslim, which can help slim exported onnx model.
pip install onnxslim
# bash
onnxslim raw_onnx_model slimmed_onnx_model --skip_fusion_patterns FusionGelu # low onnxruntime version may not support Gelu.
# python
import onnx
from onnxslim import slim
onnx_model = "your_onnx_model.onnx"
slimmed_model = slim(onnx_model)
onnx.save(slimmed_model, "slimmed_onnx_model.onnx")
Motivation
I want to slim onnx model so we can achieve better performance, I have tested provided cases, and after onnxslim, we can achieve about 3% performance gain.
import time
import requests
from PIL import Image
from optimum.onnxruntime import ORTModelForImageClassification
from transformers import AutoFeatureExtractor
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
preprocessor = AutoFeatureExtractor.from_pretrained("optimum/vit-base-patch16-224")
model = ORTModelForImageClassification.from_pretrained("optimum/vit-base-patch16-224")
inputs = preprocessor(images=image, return_tensors="pt")
warmup_runs = 5
actual_runs = 100
for _ in range(warmup_runs):
outputs = model(**inputs)
# Actual timing phase
start_time = time.time()
for _ in range(actual_runs):
outputs = model(**inputs)
end_time = time.time()
# Calculate average time per run
total_time = end_time - start_time
average_time_per_run = total_time / actual_runs
print("Average time per run: {:.6f} seconds".format(average_time_per_run))
logits = outputs.logits
with slimmed model: Average time per run: 0.246237 seconds
without slimmed model: Average time per run: 0.253707 seconds
Your contribution
I can submit a pr, and help slim existed onnx models.