HPCToolkit / hpctoolkit

HPCToolkit performance tools: measurement and analysis components

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hpcrun yields Segmentation fault depending on the event order

xeon-j opened this issue · comments

-e BLOCKTIME -e gpu=nvidia works fine but -e gpu=nvidia -e BLOCKTIME causes segmentation fault at the end of the run.

Output from failed run (LAMMPS + Kokkos CUDA):

mpiexec -n 1 \
    hpcrun -e BLOCKTIME -e gpu=nvidia \
    ./lmp ...

[FPGA09:34461] *** Process received signal ***
[FPGA09:34461] Signal: Segmentation fault (11)
[FPGA09:34461] Signal code:  (128)
[FPGA09:34461] Failing at address: (nil)
[FPGA09:34461] [ 0] /opt/hpctoolkit/lib/hpctoolkit/ext-libs/libmonitor.so(+0x77a9)[0x7f83df3747a9]
[FPGA09:34461] [ 1] /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7f83def3f3c0]
[FPGA09:34461] [ 2] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(hpcrun_get_num_metrics+0x40)[0x7f83df3c9be0]
[FPGA09:34461] [ 3] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(hpcrun_new_metric_data_list+0x37)[0x7f83df3ca1e7]
[FPGA09:34461] [ 4] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(hpcrun_metric_set_loc+0x5f)[0x7f83df3ca2af]
[FPGA09:34461] [ 5] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(hpcrun_metric_std+0x31)[0x7f83df3ca301]
[FPGA09:34461] [ 6] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(+0x22699)[0x7f83df3bb699]
[FPGA09:34461] [ 7] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(+0x20c8b)[0x7f83df3b9c8b]
[FPGA09:34461] [ 8] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(+0x1b365)[0x7f83df3b4365]
[FPGA09:34461] [ 9] /usr/local/cuda/lib64/libcupti.so(+0xf2b3b)[0x7f83dc78cb3b]
[FPGA09:34461] [10] /usr/local/cuda/lib64/libcupti.so(+0xf2ec7)[0x7f83dc78cec7]
[FPGA09:34461] [11] /usr/local/cuda/lib64/libcupti.so(+0xf512c)[0x7f83dc78f12c]
[FPGA09:34461] [12] /lib/x86_64-linux-gnu/libcuda.so(+0x34fb13)[0x7f83d6b1fb13]
[FPGA09:34461] [13] /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0(cudaLaunchKernel+0x1b2)[0x7f83df1267e2]
[FPGA09:34461] [14] ./lmp(+0x2526d5e)[0x557f975c7d5e]
[FPGA09:34461] [15] ./lmp(+0x2522964)[0x557f975c3964]
[FPGA09:34461] [16] ./lmp(+0x25229cd)[0x557f975c39cd]
[FPGA09:34461] [17] ./lmp(+0x2527307)[0x557f975c8307]
[FPGA09:34461] [18] ./lmp(_ZN6Kokkos4Impl31CudaParallelLaunchKernelInvokerINS0_11ParallelForIN9LAMMPS_NS18FixQEqReaxFFKokkosINS_4CudaEEENS_11RangePolicyIJS5_NS3_30TagFixQEqReaxFFPackForwardCommEEEES5_EENS_12LaunchBoundsILj0ELj0EEELNS0_12Experimental19CudaLaunchMechanismE1EE13invoke_kernelERKSA_RK4dim3SK_iPKNS0_12CudaInternalE+0xda)[0x557f9762a290]
...

Typo in the example output. The failed command is hpcrun -e gpu=nvidia -e BLOCKTIME , not hpcrun -e BLOCKTIME -e gpu=nvidia

What version of HPCToolkit are you using? What version of CUDA are you using?
Are you certain that the run is nearly complete when the failure occurs?

@laksono reproduced this issue on ufront

The BLOCKTIME event uses perf_events context-switch, and interestingly I see no problem if we profile with this event.
Just to make sure, @xeon-j can you try to run with context switch event as follows:

hpcrun -e gpu=nvidia -e CS ...

If this works, the problem must be inside blocktime event handler.

@laksono the callstack seems to indicate a problem accessing the metric representation. I would check to make sure that the metrics are being finalized properly.

@jmellorcrummey You're right. Looking at @xeon-j's call stack, the problem must be when accessing the metric representation.

In my case I didn't encounter the segmentation fault.
On ufront using CUDA 11.6 and HPCToolkit master branch, the executable is continuously blocking when calling open64:

(gdb) bt
#0  monitor_signal_handler (sig=38, info=0x7fffffffa630, context=0x7fffffffa500) at signal.c:203
#1  <signal handler called>
#2  0x00007ffff417b381 in open64 () from /lib64/libc.so.6
#3  0x00007ffff410ac96 in _IO_file_open () from /lib64/libc.so.6
#4  0x00007ffff410ae4b in __GI__IO_file_fopen () from /lib64/libc.so.6
#5  0x00007ffff40ff06d in __fopen_internal () from /lib64/libc.so.6
#6  0x00007ffff58947fd in ?? () from /lib64/libcuda.so.1
#7  0x00007ffff5802316 in ?? () from /lib64/libcuda.so.1
#8  0x00007ffff5886772 in ?? () from /lib64/libcuda.so.1
#9  0x00007ffff6e8b953 in ?? () from /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
#10 0x00007ffff6e8e6c8 in ?? () from /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
#11 0x00007ffff4e41dd7 in __pthread_once_slow () from /lib64/libpthread.so.0
#12 0x00007ffff6ed2ce9 in ?? () from /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
#13 0x00007ffff6e80737 in ?? () from /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
#14 0x00007ffff6ea166c in cudaDeviceReset () from /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
#15 0x0000000000400def in cuda_init_device (device_num=0) at ../utils/common.h:59
#16 main (argc=<optimized out>, argv=<optimized out>) at main.cu:52

I can reproduce this segmentation fault on llnl with cuda 11.0:

(gdb) bt
#0  0x00007fffb085ee50 in __memset_power8 () from /lib64/libc.so.6
#1  0x00007fffb0e3ef10 in hpcrun_new_metric_data_list (metric_id=45) at ../../../../src/tool/hpcrun/metrics.c:514
#2  0x00007fffb0e3eb34 in hpcrun_metric_set_loc (rv=0x7fffa1401860, id=45) at ../../../../src/tool/hpcrun/metrics.c:448
#3  0x00007fffb0e3ec18 in hpcrun_metric_std (metric_id=45, set=0x7fffa1401860, operation=43 '+', val=...) at ../../../../src/tool/hpcrun/metrics.c:466
#4  0x00007fffb0e3edd0 in hpcrun_metric_std_inc (metric_id=45, set=0x7fffa1401860, incr=...) at ../../../../src/tool/hpcrun/metrics.c:500
#5  0x00007fffb0e21d48 in gpu_metrics_attribute_metric_int (metrics=0x7fffa1401860, metric_index=45, value=311951360) at ../../../../src/tool/hpcrun/gpu/gpu-metrics.c:233
#6  0x00007fffb0e22510 in gpu_metrics_attribute_kernel (activity=0x7fffa1098e88) at ../../../../src/tool/hpcrun/gpu/gpu-metrics.c:440
#7  0x00007fffb0e22db8 in gpu_metrics_attribute (activity=0x7fffa1098e88) at ../../../../src/tool/hpcrun/gpu/gpu-metrics.c:686
#8  0x00007fffb0e1d0e4 in gpu_activity_consume (activity=0x7fffa1098e88, aa_fn=0x7fffb0e22c9c <gpu_metrics_attribute>) at ../../../../src/tool/hpcrun/gpu/gpu-activity.c:119
#9  0x00007fffb0e1cfcc in gpu_activity_channel_consume_with_idx (idx=0, aa_fn=0x7fffb0e22c9c <gpu_metrics_attribute>) at ../../../../src/tool/hpcrun/gpu/gpu-activity-channel.c:196
#10 0x00007fffb0e1cf30 in gpu_activity_channel_consume (aa_fn=0x7fffb0e22c9c <gpu_metrics_attribute>) at ../../../../src/tool/hpcrun/gpu/gpu-activity-channel.c:177
#11 0x00007fffb0e1e90c in gpu_application_thread_process_activities () at ../../../../src/tool/hpcrun/gpu/gpu-application-thread-api.c:78
#12 0x00007fffb0e1453c in cupti_device_flush (args=0x0, how=1) at ../../../../src/tool/hpcrun/gpu/nvidia/cupti-api.c:1687
#13 0x00007fffb0e152f4 in device_finalizer_apply (type=device_finalizer_type_flush, how=1) at ../../../../src/tool/hpcrun/device-finalizers.c:24
#14 0x00007fffb0e391d0 in hpcrun_fini_internal () at ../../../../src/tool/hpcrun/main.c:730
#15 0x00007fffb0e39db0 in monitor_fini_process (how=1, data=0x0) at ../../../../src/tool/hpcrun/main.c:1056
#16 0x00007fffb0db2430 in monitor_end_process_fcn (how=<optimized out>) at main.c:321
#17 0x00007fffb0db2840 in monitor_main_fence3 () at main.c:533
#18 0x00007fffb07d5280 in generic_start_main.isra.0 () from /lib64/libc.so.6
#19 0x00007fffb07d5474 in __libc_start_main () from /lib64/libc.so.6
#20 0x00007fffb0db1844 in __libc_start_main (argc=<optimized out>, argv=0x7ffffb764308, envp=0x7ffffb764318, auxp=0x7ffffb764550, rtld_fini=0x7fffb12b79a0 <_dl_fini>, stinfo=<optimized out>,
    stack_end=0x7ffffb764280) at main.c:563
#21 0x0000000000000000 in ?? ()

The problem is inside metrics.c:514 where the value of n_metrics is -370429952.

(gdb) fr 1
#1  0x00007fffb0e3ef10 in hpcrun_new_metric_data_list (metric_id=45) at ../../../../src/tool/hpcrun/metrics.c:514
514       memset(curr->metrics, 0, n_metrics * sizeof(hpcrun_metricVal_t));
(gdb) l
509       int n_metrics = hpcrun_get_num_metrics(curr->kind);
510       curr->metrics = hpcrun_malloc(n_metrics * sizeof(hpcrun_metricVal_t));
511       // FIXME(Keren): duplicate?
512       for (int i = 0; i < n_metrics; i++)
513         curr->metrics[i].v1 = curr->kind->null_metrics[i];
514       memset(curr->metrics, 0, n_metrics * sizeof(hpcrun_metricVal_t));
515       curr->next = NULL;
516       return curr;
517     }
518
(gdb) p curr
$1 = (metric_data_list_t *) 0x7fffa1401788
(gdb) p n_metrics
$2 = -370429952

Perhaps @Jokeren has an idea?

Unfortunately I don't have any idea based on the backtrace.

The command you used is hpcrun -e gpu=nvidia -e BLOCKTIME right?