CNugteren / CLBlast

Tuned OpenCL BLAS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segmentation fault with Octave-ocl

tangjinchuan opened this issue · comments

Dear Cedric,
I am a big fan of your work clblast as well as your OpenCL tutorial on GEMM.
Recently, I am trying to integrate clblast either with dynamic .so or static .a to Octave-ocl with the intention that Octave-ocl /GNU Octave could benefit from the speed upped SGEMM. However, I keep encountering Segmentation errors while incoperating the c language based SGEMM sample.

I did pay attention to the building and installing page including -fPIC for static lib, but I found no luck.

The following is the snippet of the makefile used to compile the Octave-ocl project.

Thank you very much!
Best wishes,
Jinchuan Tang

TARGET = ocl_b1n.oct

M_FILES = \
  oclArray.m \
  ocl_to_octave.m \
  gpuArray.m \
  gather.m \
  ocl_program_file.m \
  ocl_tests.m

OBJ_FILES = \
  ocl_constant.o \
  ocl_lib.o \
  ocl_context.o \
  ocl_context_obj.o \
  ocl_program.o \
  ocl_memobj.o \
  ocl_array.o \
  ocl_array_prog.o \
  ocl_ov_matrix.o \
  ocl_ov_matrix_ops.o \
  ocl_ov_matrix_fcns.o \
  ocl_ov_program.o \
  ocl_ov_types.o \
  genFFT.o \
  fftCore.o \
  transform.o \
  accessors.o \
  plan.o \
  repo.o \
  generator_stockham.o \
  generator_transpose_gcn.o \
  generator_transpose.o \
  action_transpose.o \
  generator_copy.o \
  lifetime.o \
  fft_binary_lookup.o \
  md5sum.o \
  enqueue.o \
  stdafx.o

C_FILES = $(OBJ_FILES:.o=.cc)

CC = $(MKOCTFILE)
LD = $(MKOCTFILE)

# main target

$(TARGET): $(OBJ_FILES)
	$(LD) $(OBJ_FILES) -L/user/local/lib64 -lclblast -o $(TARGET)

.SUFFIXES=
.SUFFIXES= .cc .o

.cc.o:
	$(CC)  -pipe -c $< 

# helper targets for local development

Good to hear you like CLBlast. However I'm not too familiar with Octave and how to debug things there. In general I would recommend one of the follow steps, I hope they can help in your situation:

  1. Try to reproduce the error in the smallest possible example. Does it work without Octave (just plain C or C++)? Do the CLBlast samples segfault as well?
  2. Try to debug the issue using a debugger, to pinpoint the location in code, e.g. through gdb.
  3. Compile CLBlast in verbose mode. That way messages will be printed, giving some indication of where it segfaults and what it was doing before that.

Dear Cedric,
I may have located the problem. It is the constructor Xgemm where it seems that Routine will load the .opencl source code at runtime (clblast.cpp -> Gemm -> line: auto routine = Xgemm(queue_cpp, event);). And it appears that by integrating the clblast lib with octave-ocl obj into one bin, the bin could not locate the locations of those kernels to CompileFromSource. I also tried to copy the kernel files to several locations, but still has no luck to find the right place.

template <typename T> Xgemm<T>::Xgemm(Queue &queue, EventPointer event, const std::string &name): Routine(queue, event, name, {"Copy","Pad","Transpose","Padtranspose","Xgemm","XgemmDirect","GemmRoutine"}, PrecisionValue<T>(), {}, { #include "../../kernels/level3/level3.opencl" #include "../../kernels/level3/copy_fast.opencl" #include "../../kernels/level3/copy_pad.opencl" #include "../../kernels/level3/transpose_fast.opencl" #include "../../kernels/level3/transpose_pad.opencl" #include "../../kernels/level3/convert_symmetric.opencl" #include "../../kernels/level3/convert_triangular.opencl" #include "../../kernels/level3/convert_hermitian.opencl" , // separated in multiple parts to prevent C1091 in MSVC 2013 #include "../../kernels/level3/xgemm_direct_part1.opencl" #include "../../kernels/level3/xgemm_direct_part2.opencl" #include "../../kernels/level3/xgemm_direct_part3.opencl" , // separated in multiple parts to prevent C1091 in MSVC 2013 #include "../../kernels/level3/xgemm_part1.opencl" #include "../../kernels/level3/xgemm_part2.opencl" , // separated in multiple parts to prevent C1091 in MSVC 2013 #include "../../kernels/level3/xgemm_part3.opencl" #include "../../kernels/level3/xgemm_part4.opencl" }) {

Best wishes,
Jinchuan

Glad you made some progress. However, copying the kernels somewhere will not work: those kernels are included already at compile time of the library, not afterwards. They are normal C++ pre-processor include statements and are included as C++11 raw string literals, e.g. this one. That's why the kernels start with this R"( symbol, e.g. here.

If it does crash at this place indeed, it might be that there is not enough stack size or something to store the kernels? Can you try with a simpler routine, such as xAXPY? Or try to increase the stack size or other sorts of (Octave-specific) memories?

Dear Cedric,
I have located the problem. I wrote down the following in case anyone may encounter the same problems.
The main problem was that Octave-ocl package can load different opencl lib, hence it devised a mechanism to load the lib. Thus, I must get rid of all direct referencing to cl.h file and and including a ocl_lib.h file as provided by Octave-ocl.
I tried to get rid of
/*#if defined(APPLE) || defined(__MACOSX)
#include <OpenCL/opencl.h>
#else
#include <CL/opencl.h>
#endif
*/
in all the files in clblast include directory then replacing them with #include "ocl_lib.h" and found no luck.
Today,after reading your message, I found out there is one more clpp11.hpp in the src directory of clblast. After getting rid of the same snippet with #include "ocl_lib.h". There is no segment fault any longer.
The same principle can be applied to VKFFT.h if there is someone who is interested in the fastest FFT transform on GPU.

I may try to incorporate clblast into my opensource project Octave-ocl Extra and inform the downstream project Octave-ocl's author Mat in case someone want a similar GPUArray experience like MATLAB.

Thank you very much!
Best wishes,
Jinchuan Tang

The triggering point at Xgemm was due to the fact we need to use queue to call clpp11functions to get context and device, hence the segment fault is there.