Utopia stack with Trilinos+CUDA: cudaErrorUnsupportedPtxVersion

Question

Utopia stack with Trilinos+CUDA: cudaErrorUnsupportedPtxVersion

edopao opened this issue a year ago · comments

I have built a software stack for Utopia on Clariden using this recipe:
https://github.com/edopao/utopia-recipe/blob/ede5c35792e12c4e8a0c46918846dbc543e5665d/environments.yaml

This recipe enables the CUDA variant on all packages, with cuda_arch=80. Here is the concretisation result for Trilinos:

==> Concretized cuda@11.8                                                                                                                                              
 -   3xr57ku  cuda@11.8.0%gcc@11.3.0~allow-unsupported-compilers~dev build_system=generic arch=linux-sles15-zen3

==> Concretized trilinos@13.4.0+amesos2+belos~epetra+intrepid2+mumps+nox+openmp+shards+suite-sparse+superlu-dist cxxstd=17                                             
 -   o23zzjq  trilinos@13.4.0%gcc@11.3.0~adelus~adios2~amesos+amesos2+anasazi~aztec~basker+belos~boost~chaco~complex+cuda~cuda_rdc~debug~dtk~epetra~epetraext~epetraextbtf~epetraextexperimental~epetraextgraphreorderings~exodus+explicit_template_instantiation~float+fortran~gtest~hdf5~hypre~ifpack+ifpack2~intrepid+intrepid2~ipo~isorropia+kokkos~mesquite~minitensor~ml+mpi+muelu+mumps+nox+openmp~panzer~phalanx~piro~python~rocm~rocm_rdc~rol~rythmos+sacado~scorec+shards+shared~shylu~stk~stokhos~stratimikos~strumpack+suite-sparse~superlu+superlu-dist~teko~tempus~thyra+tpetra~trilinoscouplings~uvm+wrapper~x11~zoltan~zoltan2 build_system=cmake build_type=RelWithDebInfo cuda_arch=80 cxxstd=17 gotype=long_long arch=linux-sles15-zen3

After building Utopia in the above user environment, I get a CUDA runtime error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaDeviceSynchronize() error( cudaErrorUnsupportedPtxVersion): the provided PTX was compiled with an unsupported toolchain. /tmp/epaone/spack-stage/spack-stage-trilinos-13.4.0-o23zzjqfcj6fo55x4rqqvihjdklmo6dv/spack-src/packages/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:151

Here is the output of ldd command for reference:

$ ldd utopia_test | grep cuda
	libcudart.so.11.0 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcudart.so.11.0 (0x00007fc4a0630000)
	libnvToolsExt.so.1 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libnvToolsExt.so.1 (0x00007fc4a0426000)
	libcufft.so.10 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcufft.so.10 (0x00007fc48f54b000)
	libcublas.so.11 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcublas.so.11 (0x00007fc4898ed000)
	libcusparse.so.11 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcusparse.so.11 (0x00007fc478bf5000)
	libcusolver.so.11 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcusolver.so.11 (0x00007fc46693d000)
	libcurand.so.10 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcurand.so.10 (0x00007fc460061000)
	libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x00007fc45e831000)
	libmpi_gtl_cuda.so.0 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cray-mpich-8.1.24-gcc-fwf2cccra3y3lxkzw7kvqjyvwfipin4i/lib/libmpi_gtl_cuda.so.0 (0x00007fc45b7ba000)
	libcublasLt.so.11 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcublasLt.so.11 (0x00007fc411be9000)

Ben Cumming · Answer 1 · Mon May 15 2023 23:07:20 GMT+0800 (China Standard Time)

Hello @edopao , did you ever resolve this issue?

edopao · Answer 2 · Tue May 16 2023 15:35:10 GMT+0800 (China Standard Time)

No, the issue is still there. I have tried again today and I have noticed that the default cuda architecture in the generated nvcc_wrapper is incorrect:
/user-environment/linux-sles15-zen3/gcc-11.3.0/trilinos-13.4.0-o23zzjqfcj6fo55x4rqqvihjdklmo6dv/bin/nvcc_wrapper
default_arch="sm_35"

In a local Trilinos installation on Daint, I see the correct cuda architecture for the target gpu architecture, since this script is generated when Trilinos is built for the target node.

Ben Cumming · Answer 3 · Tue May 16 2023 16:44:55 GMT+0800 (China Standard Time)

Is there a way to configure the Trilinos build system to use "sm_80"?

edopao · Answer 4 · Tue May 16 2023 19:17:08 GMT+0800 (China Standard Time)

Yes, that should be done by the variants +cuda cuda_arch=80, which seems to be taken in the concretise step.

==> Concretized trilinos@13.4.0+amesos2+belos~epetra+intrepid2+mumps+nox+openmp+shards+suite-sparse+superlu-dist cxxstd=17                                             
 -   o23zzjq  trilinos@13.4.0%gcc@11.3.0~adelus~adios2~amesos+amesos2+anasazi~aztec~basker+belos~boost~chaco~complex+cuda~cuda_rdc~debug~dtk~epetra~epetraext~epetraextbtf~epetraextexperimental~epetraextgraphreorderings~exodus+explicit_template_instantiation~float+fortran~gtest~hdf5~hypre~ifpack+ifpack2~intrepid+intrepid2~ipo~isorropia+kokkos~mesquite~minitensor~ml+mpi+muelu+mumps+nox+openmp~panzer~phalanx~piro~python~rocm~rocm_rdc~rol~rythmos+sacado~scorec+shards+shared~shylu~stk~stokhos~stratimikos~strumpack+suite-sparse~superlu+superlu-dist~teko~tempus~thyra+tpetra~trilinoscouplings~uvm+wrapper~x11~zoltan~zoltan2 build_system=cmake build_type=RelWithDebInfo cuda_arch=80 cxxstd=17 gotype=long_long arch=linux-sles15-zen3

It is very strange that it does not take effect.

edopao · Answer 5 · Tue May 16 2023 20:53:43 GMT+0800 (China Standard Time)

The last comment I wrote is probably not relevant. When I use nvcc_wrapper from /user-environment, I see that the correct cuda_arch is set on the compile line:
/user-environment/.../nvcc_wrapper ... -arch=sm_80 ... myfile.cpp
That cuda arch should override whatever cuda arch is set as default in nvcc_wrapper.

edopao · Answer 6 · Tue May 16 2023 21:09:33 GMT+0800 (China Standard Time)

If I try to compile a simple cuda program with nvcc_wrapper it works:

$ module use /user-environment/modules/
$ module load cuda trilinos
$ which nvcc_wrapper
/user-environment/linux-sles15-zen3/gcc-11.3.0/trilinos-13.4.0-o23zzjqfcj6fo55x4rqqvihjdklmo6dv/bin/nvcc_wrapper
$ nvcc_wrapper hello.cu -o hello -arch sm_80
$ ./hello 
Hello World from GPU!

So it is probably not an issue to be handled here, we can close this issue.

edopao · Answer 7 · Fri Jun 16 2023 20:52:40 GMT+0800 (China Standard Time)

I found a solution to this issue. The CUDA driver installed on clariden/hohgant is from CUDA version 11.6:

$ srun -N1 --partition=nvgpu nvidia-smi
Fri Jun 16 14:45:08 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+

The utopia-recipe environment on master branch specifies CUDA 11.8, which was inspired by Stackinator examples. CUDA 11.8 is needed to support Hopper GPUs, but clariden and hohgant nodes only have Ampere GPUs, which explains why the driver installed on these nodes is from CUDA 11.6.
I have created a system-hohgant branch on utopia-recipe repository to build a user environment with CUDA 11.6. This image works fine, no PTX version mismatch is observed.

edopao · Answer 8 · Fri Jun 16 2023 22:06:24 GMT+0800 (China Standard Time)

Adding some reference from https://docs.nvidia.com/deploy/cuda-compatibility/index.html#application-considerations

Applications using PTX will see runtime issues
Applications that compile device code to PTX will not work on older drivers. If the application requires PTX then admins have to upgrade the installed driver.
PTX Developers should refer to the CUDA Compatibility Developers Guide and PTX programming guide in the CUDA C++ Programming Guide for details on this limitation.