don't flush GPU records on a thread that hasn't performed any GPU operations
jmellorcrummey opened this issue · comments
Flushing GPU records by calling routines such cuptiActivityFlush and ompt_trace_flush has caused problems when HPCToolkit's monitoring has been turned on for certain threads.
To address this issue, I propose we add a tracking mechanism for each of the flavors of GPU events we might flush:
CUDA
ROCM
OMPT
I think that HPCToolkit already has an accounting mechanism for OpenCL and Level 0 to make sure that completion events match the number of submitted operations.
For each of CUDA, ROCM, and OMPT, we need only maintain a boolean that indicates that an operation has been submitted since the last flush. When calling a flush operation for each of these GPU operation classes, only perform the flush if the flag is set and then clear the flag. The flag should be thread-local.
fixed for AMD OMPT in 3d9994a
We do have a flag for CUPTI.
hpctoolkit/src/tool/hpcrun/gpu/nvidia/cupti-api.c
Line 1610 in e06dd34
@Jokeren thanks for pointing out that we do have such a flag for CUPTI. we should name them consistently across the GPU instrastructures.