`device_synchronize()` always ON on AMD
gsavva opened this issue · comments
When compiling for AMD execution in Release
mode, the device_synchronize
is still active which causes overheads (issue found in while profiling SIRIUS, a tracing snapshot is attached at the end).
More specifically, the #ifndef NDEBUG
block before and after the kernel launch is always executed.
device_synchronize()
is not present when compiled with CUDA
in Release
mode.
This is because the corresponding HIP_FLAGS are missing, where for CUDA, they are explicitly defined in the CMAKE
Thanks for reporting this. I'll look into a fix.
Until then, you can add the macro definition to the HIP_HCC_FLAGS
CMake variable together with the architecture flags. So something like -DHIP_HCC_FLAGS="-DNDEBUG --offload-arch=gfx906"
.
This should now be fixed by #54.
Note that the CMake HIP language feature is now being used, so HIP_HCC_FLAGS
will no longer have any effect.