spcl / dace

DaCe - Data Centric Parallel Programming

Home Page:http://dace.is/fast

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NVHPC support

FlorianDeconinck opened this issue · comments

Dear DaCe team,

Due to HPC constraint I've been dealing with a CUDA installed via NVHPC package and it has been a pleasure that mere words cannot express. When it come to DaCe, the 23.9 version of the package paired with the cmake toolchain used by DaCe leads to:

Unsupported NVHPC compiler found. nvc++ is the only NVHPC compiler that is supported.

this is linked to the fact the FindCUDA in cmake seems to mislink nvcc and nvc instead of nvc++. Particularly the command line shows:

nvcc [...] -ccbin /usr/local/other/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/bin/nvc [...]

Hacking in the generated cmake_configure.sh to add :

-DCUDA_NVCC_FLAGS="-std=c++14 -Xcompiler -fPIC -O3 -Xcompiler -march=native -gencode arch=compute_80,code=sm_80 -ccbin /usr/local/other/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/bin/nvc++" 

to override the ccbin flag works.

An entire discussion on the subject over at cmake: https://gitlab.kitware.com/cmake/cmake/-/issues/23003

Should you fix it or demand that Nvidia stop breaking stuff for the fun of it? I will leave the decision to your fine selves.

To Reproduce
OS: SLES15
NVHPC: 23.9 (CUDA 12.2)

Sounds like a CMake issue to me.

It is - though it has been raised for a year and doesn't seem like it's fixed... I guess I raised it for your support matrix, nvhpc is failing right now

Can we add a dace config for modifying ccbin? I think we have something similar to "host compiler".

This might be fixed in #1337 as it switches the cmake package to FindCUDAToolkit from the deprecated FindCUDA which should find NVHPC and use its cc.

Oddly enough, I can't reproduce this on my system with NVHPC 23.9 - my cmake in master branch does -ccbin /usr/bin/cc, which compiles just fine. The PR I linked builds without -ccbin and should work more consistently, but for now you can do CUDAHOSTCXX=/path/to/nvc++ python ./code.py

Makes sense to me that it default to your system cc. On the box I am working with there's no base compiler and the module load shuffles the PATH pretty hard. Anyway, yes I figured CUDAHOSTCXX later yesterday, so that's a good workaround.

Another is to override the cuda.args in the dace configuration to put the -ccbin by hand.

I am closing the ticket - those workarounds are enough.

If you are a developer from the future reading this, use the above or get rid of nvhpc altogether, good luck.