ROCm / ROCgdb

This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.

Home Page:https://rocm.docs.amd.com/projects/ROCgdb/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Symbol debugging for kernels

hgtsoi opened this issue · comments

Do we have symbol debuging for gpu kernels now?
I am using ROCgdb shipped with rocm-4.5.2. When checking local variables or args in rocgdb, it always shows "Optimized".
Wondering it was optimized out indeed or rocgdb does not support symbolic debugging for kernels in rocm-4.5.2?

The feature doesn't appear to be present in ROCM 5.0.2 on the gfx90a architecture either. I've compiled my code using hipcc with as many debug compiler flags as I could find, like this:

-g -ggdb3 --offload-arch=gfx90a -Xarch_device -ggdb3 -Xarch_device -g -Xarch_device -O0 -Xclang -O0 -fstandalone-debug -gdwarf-5

Even with the above flags I still can't use rocgdb to print the values of private kernel variables from within a lane, it always returns <optimized out>. Is printing variables from within a kernel actually supported at the moment?

The current ROCm 5.4 release does support function local variable printing if compiled with -O0 -g. Support for local/shared address space variables should be added in an upcoming release. You do need to have the focus on a specific AMD GPU thread and lane to see the values for that lane.

Hey folks,

Really looking forward to this feature becoming available! I just tried printing private kernel variables from within a lane, using rocgdb from ROCM 5.4.3 on the officially supported GFX906 architecture. It doesn't look like it's implemented yet, however if you use the CUDA backend with HIP you can print private kernel variables under cuda-gdb.

Printing variables allocated in private memory has been supported for some time now if compiled with -g -O0. If you have a case where it is not it would be good to see a reproduced we can investigate.

Hi t-tye,

Absolutely, attached is a self-contained matrix multiplication code that reproduces the problem. I have tried this workflow using ROCM 5.4.3 on GFX906 and ROCM 5.0.2 on GFX90a.

mat_mult_bugreport.cpp.txt

The goal is to print kernel variables i0 and i1 from within the kernel mat_mult, on line 168.

  1. Compile the code with the suggested flags, I have also tried adding the flag -ggdb, but got the same outcome

hipcc -g -O0 mat_mult_bugreport.cpp -o a.out

  1. Run rocgdb

rocgdb ./a.out

  1. From within rocgdb set a breakpoint for the kernel

b mat_mult

  1. If I then execute run on GFX906 with ROCM 5.4.3 it skips over the breakpoint for an unknown reason and finishes

run

warning: Temporarily disabling breakpoints for unloaded shared library "/a.out#offset=12288&size=99280"

On GFX90a with ROCM 5.0.2 it hits the beakpoint and I can continue...

  1. Disable the breakpoint so we don't hit it again after we step into a lane

disable

  1. Get which thread is running block (0,0,0)

info threads

  1. Change to the thread that is running block (0,0,0)

thread 4 (this might be different for you)

  1. Switch to lane 0 in the wavefront

lane 0

  1. Walk through a few lines

n

  1. Try to print variable i0

print i0

On GFX90a with ROCM 5.0.2 I get this output

$1 = <unavailable>

At present, with the versions of ROCM and architectures available to me there seems to be no straightforward way I can get kernel variables to print. I know we can inspect registers and have a look at the assembly (whose instructions don't appear to be documented publically at all) but this is too much of an ask for average researcher folk to delve into.

If this works at all I am curious to know which version of ROCM and architecture it does work on. I am giving a workshop in HIP soon with a focus on supercomputing. Having this feature available (or at least giving them hope as to what version they can expect it to work) would be so wonderful for the students/researchers.

@hgtsoi Do you still need assistance with this ticket? Thanks