Utopia stack with Trilinos+CUDA: cudaErrorUnsupportedPtxVersion
edopao opened this issue · comments
I have built a software stack for Utopia on Clariden using this recipe:
https://github.com/edopao/utopia-recipe/blob/ede5c35792e12c4e8a0c46918846dbc543e5665d/environments.yaml
This recipe enables the CUDA variant on all packages, with cuda_arch=80
. Here is the concretisation result for Trilinos:
==> Concretized cuda@11.8
- 3xr57ku cuda@11.8.0%gcc@11.3.0~allow-unsupported-compilers~dev build_system=generic arch=linux-sles15-zen3
==> Concretized trilinos@13.4.0+amesos2+belos~epetra+intrepid2+mumps+nox+openmp+shards+suite-sparse+superlu-dist cxxstd=17
- o23zzjq trilinos@13.4.0%gcc@11.3.0~adelus~adios2~amesos+amesos2+anasazi~aztec~basker+belos~boost~chaco~complex+cuda~cuda_rdc~debug~dtk~epetra~epetraext~epetraextbtf~epetraextexperimental~epetraextgraphreorderings~exodus+explicit_template_instantiation~float+fortran~gtest~hdf5~hypre~ifpack+ifpack2~intrepid+intrepid2~ipo~isorropia+kokkos~mesquite~minitensor~ml+mpi+muelu+mumps+nox+openmp~panzer~phalanx~piro~python~rocm~rocm_rdc~rol~rythmos+sacado~scorec+shards+shared~shylu~stk~stokhos~stratimikos~strumpack+suite-sparse~superlu+superlu-dist~teko~tempus~thyra+tpetra~trilinoscouplings~uvm+wrapper~x11~zoltan~zoltan2 build_system=cmake build_type=RelWithDebInfo cuda_arch=80 cxxstd=17 gotype=long_long arch=linux-sles15-zen3
After building Utopia in the above user environment, I get a CUDA runtime error:
terminate called after throwing an instance of 'std::runtime_error'
what(): cudaDeviceSynchronize() error( cudaErrorUnsupportedPtxVersion): the provided PTX was compiled with an unsupported toolchain. /tmp/epaone/spack-stage/spack-stage-trilinos-13.4.0-o23zzjqfcj6fo55x4rqqvihjdklmo6dv/spack-src/packages/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:151
Here is the output of ldd
command for reference:
$ ldd utopia_test | grep cuda
libcudart.so.11.0 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcudart.so.11.0 (0x00007fc4a0630000)
libnvToolsExt.so.1 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libnvToolsExt.so.1 (0x00007fc4a0426000)
libcufft.so.10 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcufft.so.10 (0x00007fc48f54b000)
libcublas.so.11 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcublas.so.11 (0x00007fc4898ed000)
libcusparse.so.11 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcusparse.so.11 (0x00007fc478bf5000)
libcusolver.so.11 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcusolver.so.11 (0x00007fc46693d000)
libcurand.so.10 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcurand.so.10 (0x00007fc460061000)
libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x00007fc45e831000)
libmpi_gtl_cuda.so.0 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cray-mpich-8.1.24-gcc-fwf2cccra3y3lxkzw7kvqjyvwfipin4i/lib/libmpi_gtl_cuda.so.0 (0x00007fc45b7ba000)
libcublasLt.so.11 => /user-environment/linux-sles15-zen3/gcc-11.3.0/cuda-11.8.0-3xr57kuw4q4cw53rscdnvqyjorpqamnp/lib64/libcublasLt.so.11 (0x00007fc411be9000)
Hello @edopao , did you ever resolve this issue?
No, the issue is still there. I have tried again today and I have noticed that the default cuda architecture in the generated nvcc_wrapper is incorrect:
/user-environment/linux-sles15-zen3/gcc-11.3.0/trilinos-13.4.0-o23zzjqfcj6fo55x4rqqvihjdklmo6dv/bin/nvcc_wrapper
default_arch="sm_35"
In a local Trilinos installation on Daint, I see the correct cuda architecture for the target gpu architecture, since this script is generated when Trilinos is built for the target node.
Is there a way to configure the Trilinos build system to use "sm_80"
?
Yes, that should be done by the variants +cuda cuda_arch=80
, which seems to be taken in the concretise step.
==> Concretized trilinos@13.4.0+amesos2+belos~epetra+intrepid2+mumps+nox+openmp+shards+suite-sparse+superlu-dist cxxstd=17
- o23zzjq trilinos@13.4.0%gcc@11.3.0~adelus~adios2~amesos+amesos2+anasazi~aztec~basker+belos~boost~chaco~complex+cuda~cuda_rdc~debug~dtk~epetra~epetraext~epetraextbtf~epetraextexperimental~epetraextgraphreorderings~exodus+explicit_template_instantiation~float+fortran~gtest~hdf5~hypre~ifpack+ifpack2~intrepid+intrepid2~ipo~isorropia+kokkos~mesquite~minitensor~ml+mpi+muelu+mumps+nox+openmp~panzer~phalanx~piro~python~rocm~rocm_rdc~rol~rythmos+sacado~scorec+shards+shared~shylu~stk~stokhos~stratimikos~strumpack+suite-sparse~superlu+superlu-dist~teko~tempus~thyra+tpetra~trilinoscouplings~uvm+wrapper~x11~zoltan~zoltan2 build_system=cmake build_type=RelWithDebInfo cuda_arch=80 cxxstd=17 gotype=long_long arch=linux-sles15-zen3
It is very strange that it does not take effect.
The last comment I wrote is probably not relevant. When I use nvcc_wrapper from /user-environment, I see that the correct cuda_arch is set on the compile line:
/user-environment/.../nvcc_wrapper ... -arch=sm_80 ... myfile.cpp
That cuda arch should override whatever cuda arch is set as default in nvcc_wrapper
.
If I try to compile a simple cuda program with nvcc_wrapper it works:
$ module use /user-environment/modules/
$ module load cuda trilinos
$ which nvcc_wrapper
/user-environment/linux-sles15-zen3/gcc-11.3.0/trilinos-13.4.0-o23zzjqfcj6fo55x4rqqvihjdklmo6dv/bin/nvcc_wrapper
$ nvcc_wrapper hello.cu -o hello -arch sm_80
$ ./hello
Hello World from GPU!
So it is probably not an issue to be handled here, we can close this issue.
I found a solution to this issue. The CUDA driver installed on clariden/hohgant is from CUDA version 11.6:
$ srun -N1 --partition=nvgpu nvidia-smi
Fri Jun 16 14:45:08 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
The utopia-recipe environment on master branch specifies CUDA 11.8, which was inspired by Stackinator examples. CUDA 11.8 is needed to support Hopper GPUs, but clariden and hohgant nodes only have Ampere GPUs, which explains why the driver installed on these nodes is from CUDA 11.6.
I have created a system-hohgant branch on utopia-recipe repository to build a user environment with CUDA 11.6. This image works fine, no PTX version mismatch is observed.
Adding some reference from https://docs.nvidia.com/deploy/cuda-compatibility/index.html#application-considerations
Applications using PTX will see runtime issues
Applications that compile device code to PTX will not work on older drivers. If the application requires PTX then admins have to upgrade the installed driver.
PTX Developers should refer to the CUDA Compatibility Developers Guide and PTX programming guide in the CUDA C++ Programming Guide for details on this limitation.