intel / intel-npu-acceleration-library

Intel® NPU Acceleration Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question]NPU is slower than CPU when computing some types of matmul.

Septend-fun opened this issue · comments

Hi experts. I tested Matmul(int8) with shape (1,4096)x(4096,4096) , and I got the latency 0.985 ms(NPU) . But when I tested matmul (implement by openvino) on CPU I got latency 0.75ms. The NPU is slower than CPU. Is it normal?
I didn't test matmul int8 on NPU using openvino, because it failed.

Is there a way that I can implement matmul op on NPU by myself directly without this repo or openvino? I note that matmul ops seems implements in NPU driver.
And is there a tool that I can test NPU's bandwidth peak?

Test environment: Intel Core Ultra 7 155H
Test code: https://github.com/intel/intel-npu-acceleration-library/blob/v1.1.0/script/profile_matmul.py
Test cmd: python profile_matmul.py -b 1 -c 4096 -k 4096 -q

In batch 1 matmuls are bandwidth bounded since there are no weighs reutilization. if you try bigger batches you'll see that NPU will gain the upper hand fairly quickly. Also what is the driver version you are using? Latest driver brought a significant speedup in quantized matmul operation

Thanks for your reply. I'm using the latest driver. And is there a tool that I can test NPU's bandwidth peak?

Thank you a lot. I'll try it.