Quantized model test data on GPU

Question

Quantized model test data on GPU

mszhanyi opened this issue 2 years ago · comments

Ask a Question

Since the GPU machines of CI have been upgraded from NV6 to T4, it looks quantized model on GPU should be added too.
Hardware support is required to achieve better performance with quantization on GPUs. You need a device that supports Tensor Core int8 computation, like T4 or A100.
https://onnxruntime.ai/docs/performance/quantization.html#quantization-on-gpu

But it looks that the test result on CPU with VNNI if different from on GPU. Is it expected? @yufenglee
If it's expected, shall we add test data on GPU? @jcwchen @snnn

Yi Zhang · Answer 1 · Wed Nov 30 2022 15:34:52 GMT+0800 (China Standard Time)

And to my surprise, quantized model tests on GPU(T4) have the same result same the old test data without VNNI.
So, the tests passed with the incorrect test data.

Chun-Wei Chen · Answer 2 · Fri Dec 16 2022 09:23:44 GMT+0800 (China Standard Time)

Just came back from my vacation -- Thanks for brining this up. For now, at least in ONNX Model Zoo repo, I slightly tend to only keep single valid test_data_set created by CPU EP for simplicity. It can also reduce burden for the contributors.

And to my surprise, quantized model tests on GPU(T4) have the same result same the old test data without VNNI.
So, the tests passed with the incorrect test data.

I would like to understand more about the result difference for quantized models among:

CPU without VNNI
CPU with VNNI
GPU without T4
GPU with T4

As you mentioned, it is surprised that 1=4!=2. Perhaps we can make further decision if we have confirmed this result is expected.