amd / OpenCL-caffe

This is a Experimental version of OpenCL by AMD Research, we now recommend you to use The official BVLC Caffe OpenCL branch is over at Caffe branch now at https://github.com/BVLC/caffe/tree/opencl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

runtest: Failed to build program

aelnouby opened this issue · comments

Had the same problem when building from a separate build directory.

Changing
std::string oclKernelPath = "./src/caffe/ocl/";
to
std::string oclKernelPath = "./../src/caffe/ocl/";

in src/caffe/device.cpp fixed the problem, however the path should be set in better way.

I did something very similar to fix Err: Open ocl dir failed! , i explicitly set caffe root instead of .

However this didn't seem to fix the Failed to build program error

@gujunli Could you help me with this, please ?

Hi Yibing,

Could you take a look at these path problems?

Sorry guys for the late reply. just a notice that we no longer work for AMD. I dont think we can still maintain the project for AMD. But this is an interesting project. We might move the folder to our personal github to maintain it as an open source project.

Thanks a lot!
Junli

On May 4, 2016, at 3:22 AM, Alaa El-Nouby notifications@github.com wrote:

@gujunli https://github.com/gujunli Could you help me with this, please ?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub #43 (comment)

@smistad When building and running caffe, you are assumed to be at the ROOT directory, which can be verified by the path settings in other *.sh scripts. @aelnouby Besides the building location, have you made sure that the ocl dir is not missing and readable?

@kuke I believe yes it is not missing and readable, please tell me if there is a way to check.

I think the problem is in the line
cl_int iStatus = clBuildProgram(Program, 1, pDevices, buildOption.c_str(),
The output is -11 which is CL_BUILD_PROGRAM _FAILURE

I honestly don't know what this means exactly.

@aelnouby your logs indicates that the device worked properly, so as the access of ocl files. And the failure of building program very likely results from the syntax errors in *.cl files, so haven't you changed the ocl source code intentionally or unintentionally?

I actually removed everything, and made a fresh clone to the repo, the exact same issue happened.

Some information that might or might not be useful, before installing opencl-caffe , i was using regular caffe, so i don't know if this might cause any issues concerning env variables and stuff like this.

Also if i use sudo clinfo i get the CPU only, no GPU is reported.

Maybe your AMD driver has some problems, have you ever successfully built a simple OpenCL program?

I onyl tried the HelloWorld in /opt/AMDAPPSDK-3.0/samples/opencl/bin/x86_64 directory, and it worked fine. Could you tell me anything else to try ?

Just now i have tried this example by @smistad

Ouput of gcc -I /opt/AMDAPPSDK-3.0/include -L/opt/AMDAPPSDK-3.0/lib/x86_64/ -o main main.c -Wl,-rpath,/opt/AMDAPPSDK-3.0/lib/x86_64/ -lOpenCL

main.c: In function ‘main’:
main.c:50:5: warning: ‘clCreateCommandQueue’ is deprecated [-Wdeprecated-declarations]
cl_command_queue command_queue = clCreateCommandQueue(context, device_id, 0, &ret);
^
In file included from main.c:7:0:
/opt/AMDAPPSDK-3.0/include/CL/cl.h:1359:1: note: declared here
clCreateCommandQueue(cl_context /* context */,

Output of ./main :

0 + 1024 = 1024
1 + 1023 = 1024
2 + 1022 = 1024
3 + 1021 = 1024
4 + 1020 = 1024
5 + 1019 = 1024
6 + 1018 = 1024
7 + 1017 = 1024
8 + 1016 = 1024
9 + 1015 = 1024
10 + 1014 = 1024
11 + 1013 = 1024
12 + 1012 = 1024
13 + 1011 = 1024
............... till the end

Which i beleive is the correct behaviour

It seems quite weird. We just tested OpenCL caffe on several server-end GPUs. I can't give you a reasonable explanation yet.

@kuke Thanks a lot for your time, so are there any recommendation concerning this situation, reinstall something or may be try a different OS or any other thing, i am in a desperate need for this.

I think of one important thing. You installed the CUDA SDK on the machine when using the regular, right? You'd better remove it cleanly and reinstall the AMD driver.

I didnt install CUDA SDK, i was using CPU ONLY option

I actually have installed opencl-catalyst and opencl-headers from AUR

The file in /etc/OpenCL/vendors is amdocl64.icd

I don't know if these information are relevant.

Yes. It is possible that the open source OpenCL runtime doesn't support the build options. You can try the official driver from AMD.

@aelnouby We didn't test on the Arch before, but I think the problem is on the OpenCL header files. You can try to download and reinstall everything about OpenCL from the AMD Official Website instead from AUR.

I have the same problems with an AMD Radeon HD 6570. I'm using fglrx 15.2, AMDAPPSDK-3.0, and ACML6 all from the AMD website. I also tried using AMDAPP-2.9.1 and ACML-5.3.1, but recieved the same sorts of errors. It might be a problem with clBLAS, or clBLAS is running into the same problem as caffe. I tried a bunch of different clBLAS versions (both from source and the binaries), and test-functional always fails while running the ERROR tests.

clinfo: https://gist.github.com/patmarks/f6a47f9db528d33a0ab6def34ca4c89b
Here's the result of running: ./build/test/test.testbin -alsologtostderr=1 from the caffe root directory.: https://gist.github.com/patmarks/39bb7e150bee0c1256efc44da498b911

I was also able to build the simple example by by @smistad (although I had to modify the makefile so that gcc uses '-L' to find libOpenCL.so).

@aelnouby what happens when you run test-functional? Mine is aborted while running the ERROR tests. https://gist.github.com/patmarks/3992f43f981c2b7759165e55394e36da

OpenCL error -11 on line 244 of /home/pm/Documents/jupyter/opencl/clBLAS1/src/library/blas/xgemm.cc
test-functional: /home/pm/Documents/jupyter/opencl/clBLAS1/src/library/blas/xgemm.cc:244: void makeGemmKernel(cl_kernel**, cl_command_queue, const char, const char_, const unsigned char**, size_t_, const char_): Assertion `false' failed.
Aborted

This was experimental branch of Caffe for OpenCL, we know recommend you use the now official OpenCL port of Caffe in BVLC GitHub Repo at https://github.com/BVLC/caffe/tree/opencl