CUDA version incompatibility
omarabb315 opened this issue · comments
GPU-Jupyter Issue Report
Issue Description
when I pull and run those images:
v1.6_cuda-12.0_ubuntu-22.04_python-only, v1.6_cuda-11.8_ubuntu-22.04_python-only, v1.5_cuda-12.0_ubuntu-22.04_python-only, v1.5_cuda-11.8_ubuntu-22.04_python-only
, and then run (nvcc --version) I get result showing that the cuda version is 12.3 even though I choose images with different cuda versions.
To Reproduce
sudo docker run -it --rm --gpus all cschranz/gpu-jupyter:v1.5_cuda-11.8_ubuntu-22.04_python-only nvcc --version
Expected Behavior
results showing cuda version of 11.8
Screenshots
Environment
Operating System:
Ubuntu 22.04
NVIDIA GPU and CUDA version Details:
GPU-Jupyter Version:
any of those: v1.6_cuda-12.0_ubuntu-22.04_python-only, v1.6_cuda-11.8_ubuntu-22.04_python-only, v1.5_cuda-12.0_ubuntu-22.04_python-only, v1.5_cuda-11.8_ubuntu-22.04_python-only
Docker command and parameters:
sudo docker run -it --rm --gpus all cschranz/gpu-jupyter:v1.5_cuda-11.8_ubuntu-22.04_python-only nvcc --version
Browser (if applicable):
firefox
Hi @omarabb315
Unfortunately this resulted from an undesired CUDA-update on the build-machine.
As a temporal solution you build the image locally (until the images will be rebuilt):
./generate-Dockerfile.sh --python-only
docker build -t gpu-jupyter .build/ # will take a while
docker run --gpus all -d -it -p 8848:8888 -v $(pwd)/data:/home/jovyan/work -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes -e NB_UID="$(id -u)" -e NB_GID="$(id -g)" --user root --restart always --name gpu-jupyter_1 gpu-jupyter
Thank you for your reply, I built the image locally after cloning the repo, and used your command but still resulting the same output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0
did I miss something?
This is because of
Lines 426 to 431 in ad9cc75
which installs https://anaconda.org/nvidia/cuda-nvcc: Currently version 12.3.107.
(This has nothing to do with the CUDA-version on the build-machine)
So how can I solve it , knowing that I used the lines @ChristophSchranz mentioned?
@omarabb315 You can ask mamba to install a specific version, so amend the mamba install
command to be cuda-nvcc=12.2.140
instead (as an example).
@yankcrime Thank you for your help, I am wondering why did you recommend in the pull request to pin cuda-nvcc to 12.2 while the base image has different cuda version (12.0.1) ---> nvidia/cuda:12.0.1-cudnn8-runtime-ubuntu22.04?
what about using cuda-nvcc=12.0.140?
Thanks for the PR #130 @yankcrime !
I've adapted it to pin the CUDA version to 12.0 (as the GPU-libs don't support higher version officially yet) and as @omarabb315 suggested (cuda-nvcc=12.0.140).
Should be closed with #135
Please let me know in case this error still occurs!
Please let me know in case this error still occurs!
@ChristophSchranz Thank you for your help, after pulling the new image , I still get those messages after importing TensorFlow:
and I believe this is the reason behind crashing my multi-GPU training
Hi @omarabb315,
I could reproduce your issue.
I think that the preinstalled cudNN version of nvidia/cuda
is 8.6, thus throwing warnings in Tensorflow higher than 2.13 (see here). Unfortunately, TF 2.13 (also the installation with-cuda) results in TF not finding cuda anymore.
Can you show me the output of
python -c "import tensorflow; print(tensorflow.__version__); print(tensorflow.test.is_built_with_cuda())"
python -c "from tensorflow.python.client import device_lib; device_lib.list_local_devices()"
And have you verified that this issue affects the performance? TF throws a lot of warnings, which are in many cases not >that< relevant. See here.
Thanks for feedback and Sorry for late response @ChristophSchranz
yes here is my output of your desired commands:
And no I didn't make sure that it is affecting the performance, but I am administrating a university server with JupyterHub and I need to ensure every thing is working correctly and compatible.