intel / intel-npu-acceleration-library

Intel® NPU Acceleration Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NPU utilization for maths problems

KhDenys opened this issue · comments

Hi, I'm interested in using NPU processors / AI accelerators for pure maths problem, and I wonder if I can use this library in such scenarios? Are there plans to develop the API (add more functions that can be performed on the NPU, build pipelines, etc.)?

Thanks

I was thinking of accelerate some n-body problems but never got the time. What kind of problems are you most interested in? Do you have any API in mind that can facilitate your work? I'm very interested in this use-case

@alessandropalla I've worked with math models of quantum dots, and there are used theoretically infinity detentions matrix to describe a particle/quantum system. On practice we use matrices 16x16 since all computations may take from few minutes to few hours. AFAIK Computational Number Theory also relies on huge matrices and the long arithmetic.

So it would be great if something like numpy API and have possibilities to create computation pipelines which will doing on NPU without switching contexts.

Thanks

So if it is matrix matrix multiplications that you want to accelerate we can help you. NPU can give you a massive acceleration over numpy for such operations. Here for example the code to compare NPU vs Numpy on matmukl on a [1024 x 1024] x [1024 x 1024] Matrix multiplication

import intel_npu_acceleration_library as npu_lib
import numpy as np
import tqdm
import time

def npu_vs_numpy(inC, outC, batch, n_iters=500):
    data = []

    X = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
    W = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)
    

    mm = npu_lib.backend.MatMul(inC, outC, batch)

    # Run the actual computation
    print(f"Running {n_iters} iterations of matmul with shape ({batch}, {inC}) x ({outC}, {inC})")
    print("Running NPU...")
    for _ in tqdm.tqdm(range(n_iters)):
        npu_start = time.perf_counter()
        mm.run(X, W)
        npu_stop = time.perf_counter()

    W_T = W.T

    print("Running Numpy...")
    for _ in tqdm.tqdm(range(n_iters)):
        np_start = time.perf_counter()
        np.matmul(X, W_T)
        np_stop = time.perf_counter()

        data.append({"npu_runtime_ms": (npu_stop - npu_start) * 1000, "numpy_runtime_ms": (np_stop - np_start) * 1000})

    print(f"NPU runtime: {np.mean([d['npu_runtime_ms'] for d in data]):.2f} ms ± {2 * np.std([d['npu_runtime_ms'] for d in data]):.2f} ms")
    print(f"Numpy runtime: {np.mean([d['numpy_runtime_ms'] for d in data]):.2f} ms ± {2 * np.std([d['numpy_runtime_ms'] for d in data]):.2f} ms")

npu_vs_numpy(1024, 1024, 1024, n_iters=50)

NPU runtime: 1.66 ms ± 0.00 ms
Numpy runtime: 2395.62 ms ± 199.42 ms

As you can see speedup can be very significative

If you need to accelerate other operations you can use the NNFactory class to build your pipeline. Let me know if I can help you

@alessandropalla Thanks for your example, I think it is very useful. However, unfortunately I’m only planning to buy a laptop or pc with next gen Intel cpu (with 48 TOPs), so I can’t test the perfomance differences.

Also I have no clue how to make all others matrix operation (add, subtract, inverse, eigenvalues) or how I can built them by the api. Could you point to some resource where I can find more Information?

Can you make ax example of a numpy code that you'd like to get accelerated?

@alessandropalla basically I want to implant long arithmetic for integers (Schönhage-Strassen's algorithm) it's for pure math problems. For quantum problems it require all numpy's linear algebra module.

Again I can implant all needed stuff if it will clear for me have to use the NPU for matrix operations since currently only matmul has been described. Does it possible to have such api?

I really appreciate your help, thank you!

I think that there is a very simple and elegant solution to this. Numpy allows to write custom array dispatcher (https://numpy.org/doc/stable/user/basics.dispatch.html#) that allows the NPU to be used from pure numpy. I think it will be very handy :) We have already done something similar for torch that has the same mechanism. I'll keep you updated

@alessandropalla sounds very interesting. Thanks for the update. Looking forward to being able to use NPU natively in numpy!