NPU utilization for maths problems

Question

NPU utilization for maths problems

KhDenys opened this issue 2 months ago · comments

Hi, I'm interested in using NPU processors / AI accelerators for pure maths problem, and I wonder if I can use this library in such scenarios? Are there plans to develop the API (add more functions that can be performed on the NPU, build pipelines, etc.)?

Thanks

Alessandro Palla · Answer 1 · Wed Jun 12 2024 02:24:12 GMT+0800 (China Standard Time)

I was thinking of accelerate some n-body problems but never got the time. What kind of problems are you most interested in? Do you have any API in mind that can facilitate your work? I'm very interested in this use-case

KhDenys · Answer 2 · Thu Jun 13 2024 14:12:04 GMT+0800 (China Standard Time)

@alessandropalla I've worked with math models of quantum dots, and there are used theoretically infinity detentions matrix to describe a particle/quantum system. On practice we use matrices 16x16 since all computations may take from few minutes to few hours. AFAIK Computational Number Theory also relies on huge matrices and the long arithmetic.

So it would be great if something like numpy API and have possibilities to create computation pipelines which will doing on NPU without switching contexts.

Thanks

Alessandro Palla · Answer 3 · Thu Jun 13 2024 16:40:26 GMT+0800 (China Standard Time)

So if it is matrix matrix multiplications that you want to accelerate we can help you. NPU can give you a massive acceleration over numpy for such operations. Here for example the code to compare NPU vs Numpy on matmukl on a [1024 x 1024] x [1024 x 1024] Matrix multiplication

import intel_npu_acceleration_library as npu_lib
import numpy as np
import tqdm
import time

def npu_vs_numpy(inC, outC, batch, n_iters=500):
    data = []

    X = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
    W = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)
    

    mm = npu_lib.backend.MatMul(inC, outC, batch)

    # Run the actual computation
    print(f"Running {n_iters} iterations of matmul with shape ({batch}, {inC}) x ({outC}, {inC})")
    print("Running NPU...")
    for _ in tqdm.tqdm(range(n_iters)):
        npu_start = time.perf_counter()
        mm.run(X, W)
        npu_stop = time.perf_counter()

    W_T = W.T

    print("Running Numpy...")
    for _ in tqdm.tqdm(range(n_iters)):
        np_start = time.perf_counter()
        np.matmul(X, W_T)
        np_stop = time.perf_counter()

        data.append({"npu_runtime_ms": (npu_stop - npu_start) * 1000, "numpy_runtime_ms": (np_stop - np_start) * 1000})

    print(f"NPU runtime: {np.mean([d['npu_runtime_ms'] for d in data]):.2f} ms ± {2 * np.std([d['npu_runtime_ms'] for d in data]):.2f} ms")
    print(f"Numpy runtime: {np.mean([d['numpy_runtime_ms'] for d in data]):.2f} ms ± {2 * np.std([d['numpy_runtime_ms'] for d in data]):.2f} ms")

npu_vs_numpy(1024, 1024, 1024, n_iters=50)

NPU runtime: 1.66 ms ± 0.00 ms
Numpy runtime: 2395.62 ms ± 199.42 ms

As you can see speedup can be very significative

If you need to accelerate other operations you can use the NNFactory class to build your pipeline. Let me know if I can help you

KhDenys · Answer 4 · Fri Jun 14 2024 16:48:03 GMT+0800 (China Standard Time)

@alessandropalla Thanks for your example, I think it is very useful. However, unfortunately I’m only planning to buy a laptop or pc with next gen Intel cpu (with 48 TOPs), so I can’t test the perfomance differences.

Also I have no clue how to make all others matrix operation (add, subtract, inverse, eigenvalues) or how I can built them by the api. Could you point to some resource where I can find more Information?

Alessandro Palla · Answer 5 · Mon Jun 17 2024 17:18:38 GMT+0800 (China Standard Time)

Can you make ax example of a numpy code that you'd like to get accelerated?

KhDenys · Answer 6 · Mon Jun 17 2024 18:23:02 GMT+0800 (China Standard Time)

@alessandropalla basically I want to implant long arithmetic for integers (Schönhage-Strassen's algorithm) it's for pure math problems. For quantum problems it require all numpy's linear algebra module.

Again I can implant all needed stuff if it will clear for me have to use the NPU for matrix operations since currently only matmul has been described. Does it possible to have such api?

I really appreciate your help, thank you!

Alessandro Palla · Answer 7 · Thu Jul 11 2024 20:29:29 GMT+0800 (China Standard Time)

I think that there is a very simple and elegant solution to this. Numpy allows to write custom array dispatcher (https://numpy.org/doc/stable/user/basics.dispatch.html#) that allows the NPU to be used from pure numpy. I think it will be very handy :) We have already done something similar for torch that has the same mechanism. I'll keep you updated

KhDenys · Answer 8 · Thu Jul 11 2024 21:55:07 GMT+0800 (China Standard Time)

@alessandropalla sounds very interesting. Thanks for the update. Looking forward to being able to use NPU natively in numpy!