ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability

Home Page:https://rocmdocs.amd.com/projects/HIP/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Differences in throughput for an application in HIP/SYCL

jinz2014 opened this issue · comments

The attached paper shows that the throughput of the application in SYCL is higher than that of the HIP program, but it does not explain the performance difference.

Unlocking performance portability on LUMI-G supercomputer:
A virtual screening case study
3648115.3648125.pdf

5.1.1 Software stack. [...]
Moreover, we used the HIPIFY tool 4 to
automatically generate a HIP implementation from the CUDA one,
based on HIP 5.3. We have used the ROCm LLVM’s to perform a
code build of the HIP version on AMD GPUs

5.2 Single GPU performance portability [...]
Moreover, we include an automatically generated
HIP version for AMD GPUs, while for NVIDIA GPUs, we include a
hand-optimized CUDA version.

It doesn't sound like they made any effort to tune the generated HIP version. CUDA results for A100 are double that of AdaptiveCPP so there's a good chance that hand-optimized HIP could also outperform SYCL.