Problem when debugging cuda kernel functions.

Question

Problem when debugging cuda kernel functions.

dongrixinyu opened this issue a year ago · comments

I wanna debug the cuda kernel functions like __global__ void layernorm_forward_kernel3

So I compiled this project by the command below where I replaced -O3 with -g -G.

$ /usr/local/cuda-11.7/bin/nvcc --threads=0 -t=0 --use_fast_math -std=c++17 -g -G -DMULTI_GPU -DUSE_MPI train_gpt2_fp32.cu -lcublas -lcublasLt -lnvidia-ml -L/usr/lib/x86_64-linux-gnu/openmpi/lib/  -I/usr/lib/x86_64-linux-gnu/openmpi/include/  -lnccl -lmpi -o train_gpt2fp32cu

Then I configured the vscode launch.json as this:

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "CUDA C++: Launch",
            "type": "cuda-gdb",
            "request": "launch",
            "program": "/root/github/llm.c/train_gpt2fp32cu",
            "cwd": "/root/github/llm.c/",
            // "args": [],
            // "stopAtEntry": false,
            // "MIMode": "cuda-gdb",
            // "miDebuggerPath": "/usr/local/cuda-11.7/bin/cuda-gdb",
        },
       ]
}

but it failed to execute correctly into the kernel functions. So I have no idea how to configure and do cuda debugging.

Abhinav Pandey · Answer 1 · Sun Sep 29 2024 05:27:37 GMT+0800 (China Standard Time)

You've already taken the correct first step by compiling your code with -g -G.

For VS Code Setup,

Install the CUDA extension for VS Code:
- Open the Extensions view (Ctrl+Shift+X)
- Search for "CUDA"
- Install the "CUDA" extension by Nvidia
Configure launch.json:
Your current launch.json is close, but needs some adjustments. Here's an improved version:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "CUDA C++: Launch",
            "type": "cuda-gdb",
            "request": "launch",
            "program": "${workspaceFolder}/train_gpt2fp32cu",
            "args": [],
            "stopAtEntry": true,
            "cwd": "${workspaceFolder}",
            "environment": [],
            "externalConsole": false,
            "MIMode": "cuda-gdb",
            "miDebuggerPath": "/usr/local/cuda-11.7/bin/cuda-gdb",
            "setupCommands": [
                {
                    "description": "Enable pretty-printing for cuda-gdb",
                    "text": "-enable-pretty-printing",
                    "ignoreFailures": true
                }
            ]
        }
    ]
}

Make sure to adjust the program and miDebuggerPath fields to match your specific file locations.

Getting to Debugging Process,

Set breakpoints:
- Open your CUDA source file
- Click in the gutter (left of the line numbers) to set breakpoints on the kernel launch and within the kernel function
Start debugging:
- Go to the Run and Debug view (Ctrl+Shift+D)
- Select your "CUDA C++: Launch" configuration
- Press F5 or click the green play button to start debugging
Debug your kernel:
- The debugger should stop at your first breakpoint
- Use the debug toolbar to step through your code, including into the kernel functions
- You can inspect variables, watch expressions, and use the debug console as needed

For CUDA Debugging, use CUDA-GDB commands like cuda thread to focus on specific threads within your kernel,examine GPU memory, add printf statements within your kernels. Check for CUDA errors after kernel launches and CUDA API calls.

Try this, If you're still having trouble stepping into the kernel functions,

Verify that your CUDA toolkit and VS Code extension versions are compatible.
Ensure that you're running VS Code with sufficient permissions to debug CUDA applications.
Try adding a cuda break command in your setupCommands to break on kernel entry:

"setupCommands": [
    {
        "description": "Break on kernel entry",
        "text": "cuda break layernorm_forward_kernel3"
    }
]

Replace layernorm_forward_kernel3 with the actual name of your kernel function.

Debugging CUDA kernels can be slower than CPU debugging due to the communication between the host and device. Be patient!!

dongrixinyu · Answer 2 · Sun Sep 29 2024 10:38:06 GMT+0800 (China Standard Time)

Thx very much. I tried but still failed.

There are 2 problems.

First, some keywords are not allowed in launch.json you provided, highlighted with yellow waving lines.

Second, when I debug according to this config, the console reported like this and then the process exit.

These are 2 points confusing me. Have you ever encountered these situation?

dongrixinyu · Answer 3 · Mon Sep 30 2024 11:27:31 GMT+0800 (China Standard Time)

I figured it out. the cwd param caused it