amd / OpenCL-caffe

This is a Experimental version of OpenCL by AMD Research, we now recommend you to use The official BVLC Caffe OpenCL branch is over at Caffe branch now at https://github.com/BVLC/caffe/tree/opencl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No GPU Device

opened this issue · comments

Clearly I have AMD GPU:

zhoub@zhoub:~$ lspci
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240 OEM]
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]

But it shows that:

xx@xx:~/OpenCL-caffe/build$ make runtest
[ 1%] Built target proto
[ 55%] Built target caffe
[ 55%] Built target gtest
[100%] Built target test.testbin
Current device id: 0
F1006 20:10:38.865067 11580 device.cpp:75] Err: No GPU devices
*** Check failure stack trace: ***
@ 0x2b5fc1facdaa (unknown)
@ 0x2b5fc1facce4 (unknown)
@ 0x2b5fc1fac6e6 (unknown)
@ 0x2b5fc1faf687 (unknown)
@ 0x2b5fc1941c44 caffe::Device::Init()
@ 0x6f4e48 main
@ 0x2b5fc3cc5ec5 (unknown)
@ 0x6f7d12 (unknown)
@ (nil) (unknown)
Aborted (core dumped)
make[3]: *** [src/caffe/test/CMakeFiles/runtest] Error 134
make[2]: *** [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
make: *** [runtest] Error 2

What is output of clinfo?

xx@xx:~$ clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (1800.8)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_CPU
Vendor ID: 1002h
Board name:
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 8
Preferred vector width double: 4
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 8
Native vector width double: 4
Max clock frequency: 3605Mhz
Address bits: 64
Max memory allocation: 2147483648
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 64
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 32768
Global memory size: 8312074240
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 2147483648
Max global variable size: 1879048192
Max global variable preferred total size: 1879048192
Max read/write image args: 64
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7fda8a12f430
Name: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Vendor: GenuineIntel
Device OpenCL C version: OpenCL C 1.2
Driver version: 1800.8 (sse2,avx)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (1800.8)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event

From the clinfo output:

Device Type: CL_DEVICE_TYPE_CPU

That's a CPU, ie not a GPU, nor an integrated GPU, nor an APU. I guess that AMD Caffe only supports GPUs (maybe APUs?), though it would need either an AMD dev to confirm, or for someone to browse through the code a bit.

From the code, device.cpp:

  clGetDeviceIDs(PlatformIDs[0], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);
  uiNumDevices = numDevices;
  if (0 == uiNumDevices) {
    LOG(FATAL) << "Err: No GPU devices";

The code is only looking for GPU devices, not CPU devices.

Thanks for your help, but you can see here:

zhoub@zhoub:~$ lspci
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240 OEM]

that clearly I have one GPU, why OpenCL cannot discover it automatically?

You're probably missing the matching OpenCL drivers for your GPU card, or there is some issue in their installation. Until the GPU appears in the clinfo output, AMD Caffe won't find it.

Apparently, clinfo cannot recognise your GPU device, you should fix that first, maybe you can try to reinstall the driver.

On the other hand it would be nice that Caffe supports any kind of device, especially useful for debugging. :-)

On the other hand it would be nice that Caffe supports any kind of device, especially useful for debugging. :-)

Well.... personally, for cltorch, initially I allowed all devices, but then I got tons of support requests caused by running on the CPU part, so I only show GPUs and APUs now.

I have reinstalled the AMD opencl2 driver, still cannot find GPU if I type clinfo

It's out of scope for Caffe I reckon, but anyway, if it was me, I would check the following things:

  • download latest GPU driver from AMD for my particular GPU model and OS, and install that, and reboot
  • check that an appropriate icd is in /etc/OpenCL/vendors (presumably you already have an AMD icd there, since it's showing the AMD platform for the cpu; might be an old icd version though perhaps?)
  • Do lsmod, and check the gpu driver is present
  • Try some of the following commands: dmesg | grep -i gpu, dmesg | grep -i amd, and check for errors, maybe also something like dmesg | grep EE

Note that an APU is CPU + GPU, so if you have an APU, and you have the proper drivers, you will see two OpenCL devices: one for the CPU and one for the GPU.

From: Hugh Perkins [mailto:notifications@github.com]
Sent: Tuesday, October 06, 2015 7:17 AM
To: amd/OpenCL-caffe
Subject: Re: [OpenCL-caffe] No GPU Device (#13)

From the code, device.cpp:

clGetDeviceIDs(PlatformIDs[0], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);

uiNumDevices = numDevices;

if (0 == uiNumDevices) {

LOG(FATAL) << "Err: No GPU devices";

AMD Caffe supports only GPUs. Not APUs, nor CPUs.


Reply to this email directly or view it on GitHubhttps://github.com//issues/13#issuecomment-145853379.

Thank you!

@jgoldsAMD Well, I meant CL_DEVICE_TYPE_ACCELERATOR. I think this device type is only used by certain Intel devices currently. I'm not sure if APU is the correct abbreviation for this; possibly I used the wrong abbreviation :-P

(seems like this device type is used by Intel Xeon Phi Coprocessors, eg https://software.intel.com/en-us/node/540596 https://software.intel.com/en-us/node/540573 )

That makes sense. Perhaps the FPGA devices appear as ACCELERATORS as well, I don’t have any experience with them.

From: Hugh Perkins [mailto:notifications@github.com]
Sent: Tuesday, October 06, 2015 9:44 AM
To: amd/OpenCL-caffe
Cc: Golds, Jeff
Subject: Re: [OpenCL-caffe] No GPU Device (#13)

(seems like it is used by Intel Xeon Phi Coprocessors, eg https://software.intel.com/en-us/node/540596 https://software.intel.com/en-us/node/540573 )


Reply to this email directly or view it on GitHubhttps://github.com//issues/13#issuecomment-145905577.

@hughperkins @jgoldsAMD
We support two ways of selecting devices in the code. You can specify a device ID (can be GPU or APU). If no device ID is specified, we will find a GPU for you. Please refer to the following code. Do you have suggestions to do it better or easier for users? please let us know.

if (deviceId == -1) {
int i;
for (i = 0; i < (int) uiNumDevices; i++) {
clGetDeviceInfo(pDevices[i], CL_DEVICE_HOST_UNIFIED_MEMORY,
sizeof(cl_bool), &unified_memory, NULL);
if (!unified_memory) { //skip iGPU
//we pick the first dGPU we found
pDevices[0] = pDevices[i];
device_id = i;
LOG(INFO) << "Picked default device type : dGPU " << device_id;
break;
}
}
if (i == uiNumDevices) {
LOG(FATAL) << "Cannot find any dGPU! ";
}
} else if (deviceId >= 0 && deviceId < uiNumDevices) {
pDevices[0] = pDevices[deviceId];
device_id = deviceId;
LOG(INFO) << "Picked device type : GPU " << device_id;
} else {
LOG(FATAL) << " Invalid GPU deviceId! ";
}

@jgoldsAMD Hi Jeff, Thanks very much for your reply. It is very nice to see you following this code. I guess there will be more questions about OpenCL drivers and such, which we will need you help.

@zhoubinxyz Are you remotely accessing your machine? We once ran into the same issue when we remote access through a new created account. I think we might have some issues in remote access support and it seems at one time only one user can see the GPU. @jgoldsAMD do you know this issue and how to solve it?

AFAIK, headless support was working with latest drivers. There was an issue that you had to be root to run on the GPU, but I believe that was addressed as well. You can try starting X to see if that makes a difference, but it shouldn’t be required.

From: Junli Gu [mailto:notifications@github.com]
Sent: Tuesday, October 06, 2015 1:08 PM
To: amd/OpenCL-caffe
Cc: Golds, Jeff
Subject: Re: [OpenCL-caffe] No GPU Device (#13)

@zhoubinxyzhttps://github.com/zhoubinxyz Are you remotely accessing your machine? We once ran into the same issue when we remote access through a new created account. I think we might have some issues in remote access support and it seems at one time only one user can see the GPU. @jgoldsAMDhttps://github.com/jgoldsAMD do you know this issue and how to solve it?


Reply to this email directly or view it on GitHubhttps://github.com//issues/13#issuecomment-145967928.

@gujunli No I am not using remote access. May I know what other information I can provide? It would help us a lot if we can make it run on this type of GPU.

@zhoubinxyz : assuming you have the latest drivers for your card installed, Jeff's and Junli's heads-up about remote access, use of root, and X on/off are probably good things to try I reckon. eg try various combinations of shut down x, or start x, use root or not, eg:

sudo service lightdm stop
ps -ef | grep X  # make sure no x server running
clinfo   #  does this show the GPU?
sudo clinfo   # this?

Edit: I remember now that on my own amd-box:

  • with x stopped, all works ok (cant remember if need root)
  • with x started, gpu not available, until login interactively to the desktop. ie if just start the box, and x starts, and then ssh into it, no gpu. But plug in a keyboard and monitor, and get past the lightdm login screen, and then gpu works ok

@hughperkins Hi,
After stop lightdm, I have login on tty
ps -ef | grep X shows that:
xx 2446 2303 0 13:11 tty1 00:00:00 grep --color=auto X
clinfo shows the same information as before,

But sudo clinfo shows that:
clinfo: error while loading shared libraries: libOpenCL.so.1: cannot open shared object file: No such file or directory

Since you are seeing the CPU device, I can only assume that the library path is not the same for root as for your user as clinfo relies on the OpenCL lib being present.

Which graphics driver have you installed? Not sure I saw that info earlier.

From: zhoubinxyz [mailto:notifications@github.com]
Sent: Tuesday, October 06, 2015 11:16 PM
To: amd/OpenCL-caffe
Cc: Golds, Jeff
Subject: Re: [OpenCL-caffe] No GPU Device (#13)

@hughperkinshttps://github.com/hughperkins Hi,
After stop lightdm, I have login on tty
ps -ef | grep X shows that:
xx 2446 2303 0 13:11 tty1 00:00:00 grep --color=auto X
clinfo shows the same information as before,

But sudo cliff shows that:
clinfo: error while loading shared libraries: libOpenCL.so.1: cannot open shared object file: No such file or directory


Reply to this email directly or view it on GitHubhttps://github.com//issues/13#issuecomment-146079272.

@zhoubinxyz, have you fixed all complaints in installing driver, and run "aticonfig --initial" after installing? If the answer is yes, then try "export DISPLAY:=0.0".

@zhoubinxyz Just check on you, did you fix the problem?

Sorry I was busy these days, will update if I got news, thank you :)

Hey! Apparently the solution to this issue might be to use sudo su, see hughperkins/cltorch#74 (comment)

Hi,

Is there any solution for this problem? I installed OpenCL on AMD ATI Radeon HD 7600M series and getting the same error.

~/OpenCL-caffe$ clinfo | grep CPU
  Device Type:					 CL_DEVICE_TYPE_CPU

~/OpenCL-caffe$ clinfo | grep GPU // No output
However, glxinfo displays the AMD GPU.

~/OpenCL-caffe$ glxinfo  | grep OpenGL
OpenGL vendor string: Advanced Micro Devices, Inc.
OpenGL renderer string: AMD Radeon HD 7600M Series
OpenGL core profile version string: 4.3.13416 Core Profile Context 15.201.1151

ERROR:

~/OpenCL-caffe$ make runtest
[100%] Built target proto
[100%] Built target caffe
[100%] Built target gtest
[100%] Built target test.testbin
Current device id: 0
F0304 18:49:09.351958  1995 device.cpp:75] Err: No GPU devices
*** Check failure stack trace: ***
    @     0x2b783ee9cdaa  (unknown)
    @     0x2b783ee9cce4  (unknown)
    @     0x2b783ee9c6e6  (unknown)
    @     0x2b783ee9f687  (unknown)
    @     0x2b783e8322a4  caffe::Device::Init()
    @           0x6f4468  main
    @     0x2b7840bb5f45  (unknown)
    @           0x6f7d02  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)