eth-cscs / SpFFT

Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`device_synchronize()` always ON on AMD

gsavva opened this issue · comments

When compiling for AMD execution in Release mode, the device_synchronize is still active which causes overheads (issue found in while profiling SIRIUS, a tracing snapshot is attached at the end).

More specifically, the #ifndef NDEBUG block before and after the kernel launch is always executed.

device_synchronize() is not present when compiled with CUDA in Release mode.

This is because the corresponding HIP_FLAGS are missing, where for CUDA, they are explicitly defined in the CMAKE

image

Thanks for reporting this. I'll look into a fix.
Until then, you can add the macro definition to the HIP_HCC_FLAGS CMake variable together with the architecture flags. So something like -DHIP_HCC_FLAGS="-DNDEBUG --offload-arch=gfx906".

This should now be fixed by #54.
Note that the CMake HIP language feature is now being used, so HIP_HCC_FLAGS will no longer have any effect.