onnx / models

A collection of pre-trained, state-of-the-art models in the ONNX format

Home Page:http://onnx.ai/models/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Quantized model test data on GPU

mszhanyi opened this issue · comments

Ask a Question

Since the GPU machines of CI have been upgraded from NV6 to T4, it looks quantized model on GPU should be added too.
Hardware support is required to achieve better performance with quantization on GPUs. You need a device that supports Tensor Core int8 computation, like T4 or A100.
https://onnxruntime.ai/docs/performance/quantization.html#quantization-on-gpu

But it looks that the test result on CPU with VNNI if different from on GPU. Is it expected? @yufenglee
If it's expected, shall we add test data on GPU? @jcwchen @snnn

And to my surprise, quantized model tests on GPU(T4) have the same result same the old test data without VNNI.
So, the tests passed with the incorrect test data.

Just came back from my vacation -- Thanks for brining this up. For now, at least in ONNX Model Zoo repo, I slightly tend to only keep single valid test_data_set created by CPU EP for simplicity. It can also reduce burden for the contributors.

And to my surprise, quantized model tests on GPU(T4) have the same result same the old test data without VNNI.
So, the tests passed with the incorrect test data.

I would like to understand more about the result difference for quantized models among:

  1. CPU without VNNI
  2. CPU with VNNI
  3. GPU without T4
  4. GPU with T4

As you mentioned, it is surprised that 1=4!=2. Perhaps we can make further decision if we have confirmed this result is expected.