arrayfire / arrayfire-python

Python bindings for ArrayFire: A general purpose GPU library.

Home Page:https://arrayfire.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Matrix inner product calc speed (OpenCL) is almost exactly the same as in numpy+mkl

xtipacko opened this issue · comments

commented

Matrix inner product calculation speed (OpenCL) is almost exactly the same as in numpy+mkl
I've expected higher speed...

image

Though
your bench shows higher speed:

image

What is the problem?

import numpy as np
import arrayfire as af
from time import time as now

af.set_backend('opencl')
af.info()
start = now()
a = np.array(np.random.random((3000,3000)), dtype=np.float32)
a2 = np.array(np.random.random((3000,2000)), dtype=np.float32)
print(f'Numpy array generation: {now() - start:.3f}s\n\n')

start = now()
b = af.Array(a.ctypes.data, a.shape, a.dtype.char)
b2 = af.Array(a.ctypes.data, a.shape, a.dtype.char)
print(f'Numpy to ArrayFire conversion: {now() - start:.3f}s\n\n')


start = now()
for i in range(20):
    xa = a.dot(a2)
print(f'Numpy+MKL two 3000x3000 matrix inner product 20x: {now() - 

start:.3f}s\n\n')
start = now()
for i in range(20):
    xb = af.matmul(b,b2)
print(f'ArrayFire OpenCL two 3000x3000 matrix inner product 20x: {now() - 

start:.3f}s')

@xtipacko Can you run bench_blas.py and show the output ?

commented

@pavanky for some reason it shows me this, even though I've set backend to opencl:

image

@xtipacko I think the blas operations get offloaded to the CPU for integrated GPUs. Can you try setting AF_OPENCL_CPU_OFFLOAD to 1 as an environment variable and running the code again?

commented

@pavanky Thanks a lot!, I've set AF_OPENCL_CPU_OFFLOAD to 0 and my own bench work great.
If I set AF_OPENCL_CPU_OFFLOAD to 1, I observe the same Intel MKL FATAL ERROR. But what ever, you solved my problem :)

bench_blas hung up though :)
Here is the output :)

my bench:

C:\Users\HOME-MAIN\Desktop>bench3.py
ArrayFire v3.5.0 (OpenCL, 64-bit Windows, build 05999f3)
[0] AMD: Devastator, 2048 MB
-1- AMD: AMD A10-6800K APU with Radeon(tm) HD Graphics , 7371 MB
Numpy array generation: 3.660s

Numpy to ArrayFire conversion: 0.382s

Numpy+MKL two 10000x10000 matrix inner product 20x: 498.824s

ArrayFire OpenCL two 10000x10000 matrix inner product 20x: 22.376s

bench_blas:

C:\Users\HOME-MAIN\Desktop>bench_blas.py
ArrayFire v3.5.0 (OpenCL, 64-bit Windows, build 05999f3)
[0] AMD: Devastator, 2048 MB
-1- AMD: AMD A10-6800K APU with Radeon(tm) HD Graphics , 7371 MB
Benchmark N x N matrix multiply on arrayfire
Time taken for 128 x 128: 5.7453 Gflops
Time taken for 256 x 256: 42.4715 Gflops
Time taken for 384 x 384: 35.0587 Gflops
Time taken for 512 x 512: 13.7370 Gflops
Time taken for 640 x 640: 15.9494 Gflops
Time taken for 768 x 768: 77.2308 Gflops
Time taken for 896 x 896: 16.0805 Gflops
Time taken for 1024 x 1024: 16.0036 Gflops

@xtipacko Ah sorry. I had the default options mixed up :) Closing this issue for now.