arrayfire / arrayfire-python

Python bindings for ArrayFire: A general purpose GPU library.

Home Page:https://arrayfire.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Slower performance compared to numpy

shyams2 opened this issue · comments

In order to test the speed advantages for simple operations like addition and assignment to an array, I ran the following code to compare numpy and the OpenCL backend of ArrayFire.

import time
import numpy as np
import arrayfire as af
af.set_backend("opencl")

# This is a benchmark test to compare numpy and arrayfire:

print("The following line displays the ArrayFire build and device details:")
af.info()


aNumPy = np.random.rand(100, 100)
bNumPy = np.random.rand(100, 100)

np_time_start = time.time()

for i in range(1000000):
  cNumPy = aNumPy + bNumPy

np_time_end     = time.time()
np_time_elapsed = np_time_end - np_time_start

print("numpy implementation run took time =", np_time_elapsed," seconds")

aArrayFire = af.Array(aNumPy.ctypes.data, aNumPy.shape, aNumPy.dtype.char)
bArrayFire = af.Array(bNumPy.ctypes.data, bNumPy.shape, bNumPy.dtype.char)

kernel_compilation_time_start = time.time()

cArrayFire = aArrayFire + bArrayFire
af.eval(cArrayFire)
af.sync()

kernel_compilation_time_end     = time.time()
kernel_compilation_time_elapsed = kernel_compilation_time_end - kernel_compilation_time_start

print("Kernel compilation complete. Compilation time = ", kernel_compilation_time_elapsed)

af_time_start = time.time()

for i in range(1000000):
  cArrayFire = aArrayFire + bArrayFire
  af.eval(cArrayFire)
 
af.sync()
af_time_end     = time.time()
af_time_elapsed = af_time_end - af_time_start

print("arrayfire implementation run took time =", af_time_elapsed," seconds")

I'm getting this as the output:

The following line displays the ArrayFire build and device details:
ArrayFire v3.4.2 (OpenCL, 64-bit Linux, build 2da9967)
[0] INTEL   : Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 28135 MB
numpy implementation run took time = 8.37395715713501  seconds
Kernel compilation complete. Compilation time =  0.035042762756347656
arrayfire implementation run took time = 60.80797052383423  seconds

Is there something wrong here?

@ShyamSS-95 I was just talking to @mchandra about this over chat. The reason is because at these sizes, the overhead of kernel launching is much greater than the actual operation in OpenCL.

You should try it by increasing the array sizes and decreasing the number of iterations.

This is what I get at 4096 x 4096 for 1000 iterations:

$ AF_OPENCL_DEFAULT_DEVICE=2 python examples/helloworld/helloworld.py 
ERROR: GLFW wasn't able to initalize
The following line displays the ArrayFire build and device details:
ArrayFire v3.5.0 (OpenCL, 64-bit Linux, build 06e605b0)
-0- AMD     : Hawaii, 8143 MB
-1- NVIDIA  : GeForce GTX 950, 1994 MB
[2] INTEL   : Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, 15985 MB
-3- AMD     : Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, 15985 MB
numpy implementation run took time = 48.15489220619202  seconds
Kernel compilation complete. Compilation time =  0.09143519401550293
arrayfire implementation run took time = 37.75742530822754  seconds