Segfault in libamdocl12cl64.so when running tests on R9 270X
sliterok opened this issue · comments
sliterok commented
Linux Mint 17.3
ASUS R9 270X
fglrx 2:15.200-0ubuntu0.5
AMDAPPSDK-3.0
(gdb) run
Starting program: /home/mint/new/caffe/build/test/test.testbin [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Traceback (most recent call last): File "/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", line 63, in <module> from libstdcxx.v6.printers import register_libstdcxx_printers ImportError: No module named 'libstdcxx' [New Thread 0x7fffe1808700 (LWP 20356)] Current device id: 0 Program received signal SIGSEGV, Segmentation fault. 0x00007fffd9e7defa in ?? () from /usr/lib/libamdocl12cl64.so
(gdb) bt full
#0 0x00007fffd9e7defa in ?? () from /usr/lib/libamdocl12cl64.so No symbol table info available. #1 0x00007fffd9e7e6e8 in ?? () from /usr/lib/libamdocl12cl64.so No symbol table info available. #2 0x00007ffff56145ea in __cxa_finalize (d=0x7fffdd006c40) at cxa_finalize.c:56 check = 1314 cxafn = <optimized out> cxaarg = <optimized out> f = 0xf07800 funcs = 0xf077d0 #3 0x00007fffd9e52e16 in ?? () from /usr/lib/libamdocl12cl64.so No symbol table info available. #4 0x000000000000008a in ?? () No symbol table info available. #5 0x0000000000000000 in ?? () No symbol table info available.
Hugh Perkins commented
My guess is the opencl compiler is crashing. If it was me, and I wanted to fix/diagnose it, what I would do is:
- first run all the samples from clBLAS, and check if any of them crash too
- if they crash, then log in clBLAS repo issues, but you can still carry on as below
- otherwise find the simplest program that causes a crash
- figure out which kernel(s) are being used
- comment out everything inside the kernels, so they do nothing. ie, the kernels will look like as appendix below
- => verify no longer crashes
- uncomment stuff until you figure out which bit is causing the crash
appendix, fake example of a commented out kernel
kernel void someKernel(float *data) { // dont comment out the declaration
// some
// commented
// out
// stuff
// ...
} // dont comment out the closing bracket
sliterok commented
Fixed with installing ubuntu instead of mint
Hugh Perkins commented
Interesting.