OAID / Caffe-HRT

Heterogeneous Run Time version of Caffe. Added heterogeneous capabilities to the Caffe, uses heterogeneous computing infrastructure framework to speed up Deep Learning on Arm-based heterogeneous embedded platform. It also retains all the features of the original Caffe architecture which users deploy their applications seamlessly.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OpenCL GPU acceleration with caffeOnACL

JammyZhou opened this issue · comments

To enable the build for GPU support with caffeOnACL, I did two things below:

  • Comment out “CPU_ONLY := 1” in Makefile.config
  • Comment out “COMMON_FLAGS += -DCPU_ONLY” in Makefile

But the build failed with error below. It looks like the GPU support is not ready yet for cafffeOnACL, since the CUDA related files and code are still there in that path. Did I miss something?

CXX src/caffe/solvers/nesterov_solver.cpp
In file included from ./include/caffe/common.hpp:19:0,
                 from ./include/caffe/blob.hpp:8,
                 from ./include/caffe/net.hpp:10,
                 from ./include/caffe/solver.hpp:7,
                 from ./include/caffe/sgd_solvers.hpp:7,
                 from src/caffe/solvers/nesterov_solver.cpp:3:
./include/caffe/util/device_alternate.hpp:38:23: fatal error: cublas_v2.h: No such file or directory
 #include <cublas_v2.h>
                       ^
compilation terminated.
Makefile:622: recipe for target '.build_release/src/caffe/solvers/nesterov_solver.o' failed
make: *** [.build_release/src/caffe/solvers/nesterov_solver.o] Error 1

Hi Jammy,
We use “CPU_ONLY” mode to support ACL。Although we compile source code with CPU_ONLY,we can use Caffe::set_mode(Caffe::GPU) to use ARM's GPU in our application. You could refer ./examples/cpp_classification/xxx_gpu.cpp as an example.
Regards,
Honggui

Hi Honggui,

Thanks for your reply. I can confirm that OpenCL can be used by classification_profiling_gpu.bin, but I ran into some error below, which is similar with mxnetOnACL as I reported in OAID/MXNet-HRT#3. Do you have some insights about it?

classification_profiling_gpu.bin: tools/intern/llvmufgen/HalfSupport.cpp:163: llvm::Value* {anonymous}::HalfSupportPass::getValueAs(llvm::Value*) [with bool ToHalf = false]: Assertion `(isa<Constant>(val) || is<Half>(val) != 0) && "Requested value isn't half."' failed.
Stack dump:
0.	Running pass 'HalfSupportPass' on module 'BuildGroup_2'.
Aborted

The problem is caused by missing cl_khr_fp16 support on my ARM platform