Slower performance compared to numpy

Question

Slower performance compared to numpy

shyams2 opened this issue 8 years ago · comments

Shyam Sundar Sankaran commented 8 years ago

In order to test the speed advantages for simple operations like addition and assignment to an array, I ran the following code to compare numpy and the OpenCL backend of ArrayFire.

import time
import numpy as np
import arrayfire as af
af.set_backend("opencl")

# This is a benchmark test to compare numpy and arrayfire:

print("The following line displays the ArrayFire build and device details:")
af.info()


aNumPy = np.random.rand(100, 100)
bNumPy = np.random.rand(100, 100)

np_time_start = time.time()

for i in range(1000000):
  cNumPy = aNumPy + bNumPy

np_time_end     = time.time()
np_time_elapsed = np_time_end - np_time_start

print("numpy implementation run took time =", np_time_elapsed," seconds")

aArrayFire = af.Array(aNumPy.ctypes.data, aNumPy.shape, aNumPy.dtype.char)
bArrayFire = af.Array(bNumPy.ctypes.data, bNumPy.shape, bNumPy.dtype.char)

kernel_compilation_time_start = time.time()

cArrayFire = aArrayFire + bArrayFire
af.eval(cArrayFire)
af.sync()

kernel_compilation_time_end     = time.time()
kernel_compilation_time_elapsed = kernel_compilation_time_end - kernel_compilation_time_start

print("Kernel compilation complete. Compilation time = ", kernel_compilation_time_elapsed)

af_time_start = time.time()

for i in range(1000000):
  cArrayFire = aArrayFire + bArrayFire
  af.eval(cArrayFire)
 
af.sync()
af_time_end     = time.time()
af_time_elapsed = af_time_end - af_time_start

print("arrayfire implementation run took time =", af_time_elapsed," seconds")

I'm getting this as the output:

The following line displays the ArrayFire build and device details:
ArrayFire v3.4.2 (OpenCL, 64-bit Linux, build 2da9967)
[0] INTEL   : Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 28135 MB
numpy implementation run took time = 8.37395715713501  seconds
Kernel compilation complete. Compilation time =  0.035042762756347656
arrayfire implementation run took time = 60.80797052383423  seconds

Is there something wrong here?

Pavan Yalamanchili · Answer 1 · Wed Feb 08 2017 00:23:30 GMT+0800 (China Standard Time)

@ShyamSS-95 I was just talking to @mchandra about this over chat. The reason is because at these sizes, the overhead of kernel launching is much greater than the actual operation in OpenCL.

You should try it by increasing the array sizes and decreasing the number of iterations.

This is what I get at 4096 x 4096 for 1000 iterations:

$ AF_OPENCL_DEFAULT_DEVICE=2 python examples/helloworld/helloworld.py 
ERROR: GLFW wasn't able to initalize
The following line displays the ArrayFire build and device details:
ArrayFire v3.5.0 (OpenCL, 64-bit Linux, build 06e605b0)
-0- AMD     : Hawaii, 8143 MB
-1- NVIDIA  : GeForce GTX 950, 1994 MB
[2] INTEL   : Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, 15985 MB
-3- AMD     : Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, 15985 MB
numpy implementation run took time = 48.15489220619202  seconds
Kernel compilation complete. Compilation time =  0.09143519401550293
arrayfire implementation run took time = 37.75742530822754  seconds