arrayfire / arrayfire-python

Python bindings for ArrayFire: A general purpose GPU library.

Home Page:https://arrayfire.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Odd performance trends in conversion from af.Array to np.ndarray

shyams2 opened this issue · comments

I've used this code to time conversions from af.Array to np.ndarray and vice-versa.

When I run the code on the GPU, I get the following times:

$ python test.py 32
ArrayFire v3.6.0 (OpenCL, 64-bit Linux, build d9bc8d7)
[0] NVIDIA: Quadro M1000M, 2047 MB
-1- INTEL: Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz, 31993 MB
Conversion from af to np:
Time Elapsed for 100 loops = 10.877396583557129

Conversion from np to af:
Kernel compilation time = 0.08670806884765625
Time Elapsed for 100 loops = 8.28329849243164

Additionally, the peak memory consumption when run on the GPU is at 330MB
However, when I run it on the CPU:

$ AF_OPENCL_DEFAULT_DEVICE_TYPE=CPU python test.py 32
ArrayFire v3.6.0 (OpenCL, 64-bit Linux, build d9bc8d7)
-0- NVIDIA: Quadro M1000M, 2047 MB
[1] INTEL: Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz, 31993 MB
Conversion from af to np:
Time Elapsed for 100 loops = 9.318161487579346

Conversion from np to af:
Kernel compilation time = 0.4476149082183838
Time Elapsed for 100 loops = 41.09346866607666

The memory consumption in this case peaks at 6.5 GB. Is this a driver issue?

@ShyamSS-95 This looks like a driver issue. I tried running the test script on the CPU using both AMDs driver as well as Intels driver. The memory required by Intels driver is about 12 GB while on AMD it was as expected.

On Intel

homer:arrayfire-python.devel $ AF_OPENCL_DEFAULT_DEVICE=2 python /tmp/test_conversion.py 32
ArrayFire v3.6.0 (OpenCL, 64-bit Linux, build 12cde8db)
-0- AMD: Hawaii, 8163 MB
-1- NVIDIA: GeForce GTX 950, 1995 MB
[2] INTEL: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, 15985 MB
-3- AMD: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, 15985 MB
Conversion from af to np:
Time Elapsed for 100 loops = 13.246779441833496

Conversion from np to af:
Kernel compilation time = 0.5209839344024658
Time Elapsed for 100 loops = 72.9504280090332

On AMD

homer:arrayfire-python.devel $ AF_OPENCL_DEFAULT_DEVICE=3 python /tmp/test_conversion.py 32
ArrayFire v3.6.0 (OpenCL, 64-bit Linux, build 12cde8db)
-0- AMD: Hawaii, 8163 MB
-1- NVIDIA: GeForce GTX 950, 1995 MB
-2- INTEL: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, 15985 MB
[3] AMD: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, 15985 MB
Conversion from af to np:
Time Elapsed for 100 loops = 12.959728956222534

Conversion from np to af:
Kernel compilation time = 0.5871903896331787
Time Elapsed for 100 loops = 48.30834913253784

@pavanky Is the slower conversion from np.ndarray to af.Array on the CPU vs GPU an expected trait? Since it's just a conversion, shouldn't the times almost be the same?

@ShyamSS-95 numpy stores data in row major format and arrayfire stores it in column major format. To fix this a af.reorder is performed after the copy.

@ShyamSS-95 You can check notice the difference by copying a 1D array where row major and column major doesn't matter.