HPCToolkit / hpctoolkit

Flushing GPU records by calling routines such cuptiActivityFlush and ompt_trace_flush has caused problems when HPCToolkit's monitoring has been turned on for certain threads.

To address this issue, I propose we add a tracking mechanism for each of the flavors of GPU events we might flush:
CUDA
ROCM
OMPT

I think that HPCToolkit already has an accounting mechanism for OpenCL and Level 0 to make sure that completion events match the number of submitted operations.

For each of CUDA, ROCM, and OMPT, we need only maintain a boolean that indicates that an operation has been submitted since the last flush. When calling a flush operation for each of these GPU operation classes, only perform the flush if the flag is set and then clear the flag. The flag should be thread-local.

fixed for AMD OMPT in 3d9994a

We do have a flag for CUPTI.

hpctoolkit/src/tool/hpcrun/gpu/nvidia/cupti-api.c

Line 1610 in e06dd34

if (cupti_stop_flag) {

@Jokeren thanks for pointing out that we do have such a flag for CUPTI. we should name them consistently across the GPU instrastructures.

don't flush GPU records on a thread that hasn't performed any GPU operations