onnx / models

A collection of pre-trained, state-of-the-art models in the ONNX format

Home Page:http://onnx.ai/models/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MxNet Converted Arcface Model Slow Compared to Provided Arcface Model

tk4218 opened this issue · comments

I have a arcface/resnet100 model that I've trained using InsightFace's MxNet training. For inference, I have converted the model to ONNX with the help of https://github.com/linghu8812/tensorrt_inference/blob/master/project/arcface/export_onnx.py.

The inference results are correct on my converted model, however speed of the model is extremely slow. For reference, I compared with the arcface model provided in this repository (arcfaceresnet100-8.onnx). Inference when running my model takes ~7 seconds, whereas the other model takes < 1 second.

When comparing the two models in Netron, all of the nodes, attributes, input/output shapes are the same (weights are different, obviously), however when I run the onnx profiler on the two models, there are a few differences. I've attached the profile logs for both models.

profile_arcfaceresnet100-8.txt
profile_model-opt.txt

There are a few differences in the two logs. Mainly (mine vs. arcfaceresnet100-8):

  • ReorderInput/ReorderOutput operations: 99 vs. 51
  • Conv operations: 103 vs. 152
  • BatchNormalization operations: 51 vs. 2
  • Avg. PRelu time: 995.8μs vs. 129.1μs
  • Avg. Conv time: 12079.6μs vs. 411.9μs

I am not sure what the differences are when converting. It is critical that I get my converted model to run with a similar performance as the arcfaceresnet100-8 model. I've tried running my model through simplifiers/optimizers/etc., but with no improvement.

Here are my environment details:

OS: Linux Ubuntu Server 20.04
Python: 3.8

MxNet version: 1.9.1
ONNXRuntime: 1.14.0
ONNX 1.13.0
ONNX IR Version: 8
ONNX Opset Version: 18

If anyone could provide insight as to why my model performs slower or why there are differences in execution, that would be extremely helpful.

After some further testing, I manually updated my model to match the IR and Opset version of the arcfaceresent100-8 model (IR version 3, Opset version 8), and that seems to have resolved the node differences in the profiles. I'm now seeing 51 ReorderInput/ReorderOutput, 152 Conv, 2 BatchNormalization, etc.

It is pretty clear now that the Conv and PRelu execution times are what is causing my model to be slow, however I still don't see any differences in those with the other model. One thing to note is that the weights of my model are significantly smaller (for example, -4.930378685479061e-25 vs. 0.00033268501283600926), but both are float32.

Not sure if the weight values could cause any slowdown, but struggling to find the differences in my Conv/PRelu nodes that could cause the slowdown.