intel / intel-npu-acceleration-library

Intel® NPU Acceleration Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Get confused of why Llama example can run on win10

nigue3025 opened this issue · comments

Hi,
I get a machine with Intel Core Ultra5 cpu and Win10 OS

Event though I know that installation of NPU driver is only supported by win11 or ubuntu22.0.4 now, I still make a try to the provided Llama example using intel_npu_accleration_library.

Suprisingly, the example works fine although the token generating speed is slow.

I am wondering if NPU in this case is actually activated or is it simply run on CPU?
( There is no NPU monitor in win10 which allows me to check if NPU is actually activated. For another example, an error message would occur while running a cuda program on the PC without NVIDA's GPU, telling me that NVIDIA's GPU is not worked or supported in the PC.)

Any suggestion or reply is appreciated!

is it simply run on CPU?
Yes you are right, still workable but very slow though.

is it simply run on CPU?
Yes you are right, still workable but very slow though.

So this library is flexible to run on different machines.

Btw. Is there any state variable I can check that current model is on CPU or NPU?

If you do not have a NPU or the NPU driver, the library fallback to CPU (using OV CPU device) This is not very fast but it is very handy for Github actions CI/CD. Yes there are a bunch of functions like npu_available() doc and get_driver_version() doc

Let me know if you have in mind different API to understand where each layer of a model get implemented. I'm working on an extension of the compile method to allows more freedom to the users so any feedback in that sense is really appreciated

If you do not have a NPU or the NPU driver, the library fallback to CPU (using OV CPU device) This is not very fast but it is very handy for Github actions CI/CD. Yes there are a bunch of functions like npu_available() doc and get_driver_version() doc

Let me know if you have in mind different API to understand where each layer of a model get implemented. I'm working on an extension of the compile method to allows more freedom to the users so any feedback in that sense is really appreciated

The functions are exactly what I want and it works fine!
The document is great
I'll try to upgrade my OS and driver to harness the NPU computation power
Really appreciate for your help!