hpcrun yields Segmentation fault depending on the event order

Question

hpcrun yields Segmentation fault depending on the event order

xeon-j opened this issue 3 years ago · comments

-e BLOCKTIME -e gpu=nvidia works fine but -e gpu=nvidia -e BLOCKTIME causes segmentation fault at the end of the run.

Output from failed run (LAMMPS + Kokkos CUDA):

mpiexec -n 1 \
    hpcrun -e BLOCKTIME -e gpu=nvidia \
    ./lmp ...

[FPGA09:34461] *** Process received signal ***
[FPGA09:34461] Signal: Segmentation fault (11)
[FPGA09:34461] Signal code:  (128)
[FPGA09:34461] Failing at address: (nil)
[FPGA09:34461] [ 0] /opt/hpctoolkit/lib/hpctoolkit/ext-libs/libmonitor.so(+0x77a9)[0x7f83df3747a9]
[FPGA09:34461] [ 1] /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7f83def3f3c0]
[FPGA09:34461] [ 2] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(hpcrun_get_num_metrics+0x40)[0x7f83df3c9be0]
[FPGA09:34461] [ 3] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(hpcrun_new_metric_data_list+0x37)[0x7f83df3ca1e7]
[FPGA09:34461] [ 4] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(hpcrun_metric_set_loc+0x5f)[0x7f83df3ca2af]
[FPGA09:34461] [ 5] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(hpcrun_metric_std+0x31)[0x7f83df3ca301]
[FPGA09:34461] [ 6] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(+0x22699)[0x7f83df3bb699]
[FPGA09:34461] [ 7] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(+0x20c8b)[0x7f83df3b9c8b]
[FPGA09:34461] [ 8] /opt/hpctoolkit/lib/hpctoolkit/libhpcrun.so(+0x1b365)[0x7f83df3b4365]
[FPGA09:34461] [ 9] /usr/local/cuda/lib64/libcupti.so(+0xf2b3b)[0x7f83dc78cb3b]
[FPGA09:34461] [10] /usr/local/cuda/lib64/libcupti.so(+0xf2ec7)[0x7f83dc78cec7]
[FPGA09:34461] [11] /usr/local/cuda/lib64/libcupti.so(+0xf512c)[0x7f83dc78f12c]
[FPGA09:34461] [12] /lib/x86_64-linux-gnu/libcuda.so(+0x34fb13)[0x7f83d6b1fb13]
[FPGA09:34461] [13] /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0(cudaLaunchKernel+0x1b2)[0x7f83df1267e2]
[FPGA09:34461] [14] ./lmp(+0x2526d5e)[0x557f975c7d5e]
[FPGA09:34461] [15] ./lmp(+0x2522964)[0x557f975c3964]
[FPGA09:34461] [16] ./lmp(+0x25229cd)[0x557f975c39cd]
[FPGA09:34461] [17] ./lmp(+0x2527307)[0x557f975c8307]
[FPGA09:34461] [18] ./lmp(_ZN6Kokkos4Impl31CudaParallelLaunchKernelInvokerINS0_11ParallelForIN9LAMMPS_NS18FixQEqReaxFFKokkosINS_4CudaEEENS_11RangePolicyIJS5_NS3_30TagFixQEqReaxFFPackForwardCommEEEES5_EENS_12LaunchBoundsILj0ELj0EEELNS0_12Experimental19CudaLaunchMechanismE1EE13invoke_kernelERKSA_RK4dim3SK_iPKNS0_12CudaInternalE+0xda)[0x557f9762a290]
...

Laksono Adhianto commented 3 years ago

Yes

Jae-Eon Jo · Answer 1 · Wed Apr 06 2022 14:46:02 GMT+0800 (China Standard Time)

Typo in the example output. The failed command is hpcrun -e gpu=nvidia -e BLOCKTIME , not hpcrun -e BLOCKTIME -e gpu=nvidia

John Mellor-Crummey · Answer 2 · Wed Apr 06 2022 22:23:24 GMT+0800 (China Standard Time)

What version of HPCToolkit are you using? What version of CUDA are you using?
Are you certain that the run is nearly complete when the failure occurs?

Keren Zhou · Answer 3 · Thu Apr 07 2022 00:00:07 GMT+0800 (China Standard Time)

@laksono reproduced this issue on ufront

Laksono Adhianto · Answer 4 · Thu Apr 07 2022 00:23:23 GMT+0800 (China Standard Time)

The BLOCKTIME event uses perf_events context-switch, and interestingly I see no problem if we profile with this event.
Just to make sure, @xeon-j can you try to run with context switch event as follows:

hpcrun -e gpu=nvidia -e CS ...

If this works, the problem must be inside blocktime event handler.

John Mellor-Crummey · Answer 5 · Thu Apr 07 2022 00:32:37 GMT+0800 (China Standard Time)

@laksono the callstack seems to indicate a problem accessing the metric representation. I would check to make sure that the metrics are being finalized properly.

Laksono Adhianto · Answer 6 · Thu Apr 07 2022 01:19:15 GMT+0800 (China Standard Time)

@jmellorcrummey You're right. Looking at @xeon-j's call stack, the problem must be when accessing the metric representation.

In my case I didn't encounter the segmentation fault.
On ufront using CUDA 11.6 and HPCToolkit master branch, the executable is continuously blocking when calling open64:

(gdb) bt
#0  monitor_signal_handler (sig=38, info=0x7fffffffa630, context=0x7fffffffa500) at signal.c:203
#1  <signal handler called>
#2  0x00007ffff417b381 in open64 () from /lib64/libc.so.6
#3  0x00007ffff410ac96 in _IO_file_open () from /lib64/libc.so.6
#4  0x00007ffff410ae4b in __GI__IO_file_fopen () from /lib64/libc.so.6
#5  0x00007ffff40ff06d in __fopen_internal () from /lib64/libc.so.6
#6  0x00007ffff58947fd in ?? () from /lib64/libcuda.so.1
#7  0x00007ffff5802316 in ?? () from /lib64/libcuda.so.1
#8  0x00007ffff5886772 in ?? () from /lib64/libcuda.so.1
#9  0x00007ffff6e8b953 in ?? () from /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
#10 0x00007ffff6e8e6c8 in ?? () from /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
#11 0x00007ffff4e41dd7 in __pthread_once_slow () from /lib64/libpthread.so.0
#12 0x00007ffff6ed2ce9 in ?? () from /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
#13 0x00007ffff6e80737 in ?? () from /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
#14 0x00007ffff6ea166c in cudaDeviceReset () from /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
#15 0x0000000000400def in cuda_init_device (device_num=0) at ../utils/common.h:59
#16 main (argc=<optimized out>, argv=<optimized out>) at main.cu:52

Laksono Adhianto · Answer 7 · Thu Apr 07 2022 03:13:15 GMT+0800 (China Standard Time)

I can reproduce this segmentation fault on llnl with cuda 11.0:

(gdb) bt
#0  0x00007fffb085ee50 in __memset_power8 () from /lib64/libc.so.6
#1  0x00007fffb0e3ef10 in hpcrun_new_metric_data_list (metric_id=45) at ../../../../src/tool/hpcrun/metrics.c:514
#2  0x00007fffb0e3eb34 in hpcrun_metric_set_loc (rv=0x7fffa1401860, id=45) at ../../../../src/tool/hpcrun/metrics.c:448
#3  0x00007fffb0e3ec18 in hpcrun_metric_std (metric_id=45, set=0x7fffa1401860, operation=43 '+', val=...) at ../../../../src/tool/hpcrun/metrics.c:466
#4  0x00007fffb0e3edd0 in hpcrun_metric_std_inc (metric_id=45, set=0x7fffa1401860, incr=...) at ../../../../src/tool/hpcrun/metrics.c:500
#5  0x00007fffb0e21d48 in gpu_metrics_attribute_metric_int (metrics=0x7fffa1401860, metric_index=45, value=311951360) at ../../../../src/tool/hpcrun/gpu/gpu-metrics.c:233
#6  0x00007fffb0e22510 in gpu_metrics_attribute_kernel (activity=0x7fffa1098e88) at ../../../../src/tool/hpcrun/gpu/gpu-metrics.c:440
#7  0x00007fffb0e22db8 in gpu_metrics_attribute (activity=0x7fffa1098e88) at ../../../../src/tool/hpcrun/gpu/gpu-metrics.c:686
#8  0x00007fffb0e1d0e4 in gpu_activity_consume (activity=0x7fffa1098e88, aa_fn=0x7fffb0e22c9c <gpu_metrics_attribute>) at ../../../../src/tool/hpcrun/gpu/gpu-activity.c:119
#9  0x00007fffb0e1cfcc in gpu_activity_channel_consume_with_idx (idx=0, aa_fn=0x7fffb0e22c9c <gpu_metrics_attribute>) at ../../../../src/tool/hpcrun/gpu/gpu-activity-channel.c:196
#10 0x00007fffb0e1cf30 in gpu_activity_channel_consume (aa_fn=0x7fffb0e22c9c <gpu_metrics_attribute>) at ../../../../src/tool/hpcrun/gpu/gpu-activity-channel.c:177
#11 0x00007fffb0e1e90c in gpu_application_thread_process_activities () at ../../../../src/tool/hpcrun/gpu/gpu-application-thread-api.c:78
#12 0x00007fffb0e1453c in cupti_device_flush (args=0x0, how=1) at ../../../../src/tool/hpcrun/gpu/nvidia/cupti-api.c:1687
#13 0x00007fffb0e152f4 in device_finalizer_apply (type=device_finalizer_type_flush, how=1) at ../../../../src/tool/hpcrun/device-finalizers.c:24
#14 0x00007fffb0e391d0 in hpcrun_fini_internal () at ../../../../src/tool/hpcrun/main.c:730
#15 0x00007fffb0e39db0 in monitor_fini_process (how=1, data=0x0) at ../../../../src/tool/hpcrun/main.c:1056
#16 0x00007fffb0db2430 in monitor_end_process_fcn (how=<optimized out>) at main.c:321
#17 0x00007fffb0db2840 in monitor_main_fence3 () at main.c:533
#18 0x00007fffb07d5280 in generic_start_main.isra.0 () from /lib64/libc.so.6
#19 0x00007fffb07d5474 in __libc_start_main () from /lib64/libc.so.6
#20 0x00007fffb0db1844 in __libc_start_main (argc=<optimized out>, argv=0x7ffffb764308, envp=0x7ffffb764318, auxp=0x7ffffb764550, rtld_fini=0x7fffb12b79a0 <_dl_fini>, stinfo=<optimized out>,
    stack_end=0x7ffffb764280) at main.c:563
#21 0x0000000000000000 in ?? ()

The problem is inside metrics.c:514 where the value of n_metrics is -370429952.

(gdb) fr 1
#1  0x00007fffb0e3ef10 in hpcrun_new_metric_data_list (metric_id=45) at ../../../../src/tool/hpcrun/metrics.c:514
514       memset(curr->metrics, 0, n_metrics * sizeof(hpcrun_metricVal_t));
(gdb) l
509       int n_metrics = hpcrun_get_num_metrics(curr->kind);
510       curr->metrics = hpcrun_malloc(n_metrics * sizeof(hpcrun_metricVal_t));
511       // FIXME(Keren): duplicate?
512       for (int i = 0; i < n_metrics; i++)
513         curr->metrics[i].v1 = curr->kind->null_metrics[i];
514       memset(curr->metrics, 0, n_metrics * sizeof(hpcrun_metricVal_t));
515       curr->next = NULL;
516       return curr;
517     }
518
(gdb) p curr
$1 = (metric_data_list_t *) 0x7fffa1401788
(gdb) p n_metrics
$2 = -370429952

Perhaps @Jokeren has an idea?

Keren Zhou · Answer 8 · Thu Apr 07 2022 04:25:58 GMT+0800 (China Standard Time)

Unfortunately I don't have any idea based on the backtrace.

The command you used is hpcrun -e gpu=nvidia -e BLOCKTIME right?

Jonathon Anderson · Answer 9 · Fri Aug 19 2022 06:36:17 GMT+0800 (China Standard Time)

This repository has moved to GitLab, please continue this issue there.