python installer cuda version mismatch in conda environment
HowcanoeWang opened this issue · comments
Problem statement
I got the cuda version mismatch problem when trying to install this package in the Arch Linux.
Since Arch Linux always keeping the system package lastest, it may NOT feasible and convenient to downgrade the system cuda (may cause conflict with other system softwares), so I prefer using conda for different cuda and cudnn version management.
The nvcc
/ cuda version of the Arch Linux system package is 12.5, its path locate at /opt/cuda/bin/nvcc
:
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
Reproduce
Create an cuda=11.8 conda environment by the following commands:
$ conda create -n nerfstudio python=3.10
$ conda activate nerfstudio
(nerfstudio) $ python -m pip install --upgrade pip
(nerfstudio) $ conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia
(nerfstudio) $ conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
(nerfstudio) $ conda install ninja
Check the nvcc
version in this environment, we can see it correctly installed cuda=11.8:
(nerfstudio) $ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
(nerfstudio) $ python -c "import torch; print(torch.__version__)"
2.1.2
(nerfstudio) $ python -c "import torch; print(torch.version.cuda)"
11.8
However, when using this environment to install tiny-cuda-nn
package by pip, it gives the following mismatch error:
(nerfstudio) $ pip install 'git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch'
Collecting git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
Cloning https://github.com/NVlabs/tiny-cuda-nn/ to /tmp/pip-req-build-unexg2nn
Running command git clone --filter=blob:none --quiet https://github.com/NVlabs/tiny-cuda-nn/ /tmp/pip-req-build-unexg2nn
Resolved https://github.com/NVlabs/tiny-cuda-nn/ to commit b3473c81396fe927293bdfd5a6be32df8769927c
Running command git submodule update --init --recursive -q
Preparing metadata (setup.py) ... done
Building wheels for collected packages: tinycudann
Building wheel for tinycudann (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [74 lines of output]
/tmp/pip-req-build-unexg2nn/bindings/torch/setup.py:5: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import parse_version
Building PyTorch extension for tiny-cuda-nn version 1.7
Obtained compute capability 86 from PyTorch
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Detected CUDA version 11.8
Targeting C++ standard 17
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/tinycudann
copying tinycudann/__init__.py -> build/lib.linux-x86_64-cpython-310/tinycudann
copying tinycudann/modules.py -> build/lib.linux-x86_64-cpython-310/tinycudann
running egg_info
creating tinycudann.egg-info
writing tinycudann.egg-info/PKG-INFO
writing dependency_links to tinycudann.egg-info/dependency_links.txt
writing top-level names to tinycudann.egg-info/top_level.txt
writing manifest file 'tinycudann.egg-info/SOURCES.txt'
reading manifest file 'tinycudann.egg-info/SOURCES.txt'
writing manifest file 'tinycudann.egg-info/SOURCES.txt'
copying tinycudann/bindings.cpp -> build/lib.linux-x86_64-cpython-310/tinycudann
running build_ext
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-req-build-unexg2nn/bindings/torch/setup.py", line 189, in <module>
setup(
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
return distutils.core.setup(**attrs)
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
return run_commands(dist)
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
dist.run_commands()
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 368, in run
self.run_command("build")
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
self.run_command(cmd_name)
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 91, in run
_build_ext.run(self)
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
self.build_extensions()
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 525, in build_extensions
_check_cuda_version(compiler_name, compiler_version)
File "/home/crest/Applications/miniconda3/envs/nerfstudio/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 413, in _check_cuda_version
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError:
The detected CUDA version (12.5) mismatches the version that was used to compile
PyTorch (11.8). Please make sure to use the same CUDA versions.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tinycudann
Running setup.py clean for tinycudann
Failed to build tinycudann
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (tinycudann)
The strange thing is, according to its output log, its successfully detected cuda version 11.8
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Detected CUDA version 11.8
Targeting C++ standard 17
but seems for compiling, it uses the system cuda version 12.5.
What I have tried
change env variables
Setting the path and other environment variable in cmd like this does not work:
export PATH="/home/hwang/Applications/miniconda3/envs/nerfstudio/bin:$PATH"
export LD_LIBRARY_PATH="/home/hwang/Applications/miniconda3/envs/nerfstudio/lib:$LD_LIBRARY_PATH"
I even tried setup in .bashrc
and .zshrc
, after setting, the system nvcc version even changed to that conda environment version but still not work:
$ nvcc -V # with conda environment deactivated
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
delete system cuda package
I renamed the system cuda to another name:
sudo mv /opt/cuda /opt/cuda.back
And the installer failed with unable to find /opt/cuda/bin/nvcc
even it detected the conda nvcc:
/home/hwang/Documents/Github/tiny-cuda-nn/bindings/torch/setup.py:5: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import parse_version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Building PyTorch extension for tiny-cuda-nn version 1.7
Obtained compute capability 89 from PyTorch
Detected CUDA version 11.8
Targeting C++ standard 17
running develop
/home/hwang/Applications/miniconda3/envs/sdfstudio/lib/python3.10/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
...
self.initialize_options()
running egg_info
writing tinycudann.egg-info/PKG-INFO
writing dependency_links to tinycudann.egg-info/dependency_links.txt
writing top-level names to tinycudann.egg-info/top_level.txt
reading manifest file 'tinycudann.egg-info/SOURCES.txt'
writing manifest file 'tinycudann.egg-info/SOURCES.txt'
running build_ext
error: [Errno 2] No such file or directory: '/opt/cuda/bin/nvcc'
Expected solution
Any ideas or suggestions for allowing the installer using the path to the conda environment nvcc rather than the system one?
Hi, thanks for the detailed report! I have to admit that I have no idea why the build system defaults to the system CUDA rather than the one that is foremost in your conda environment PATH.
For reference: the code for setting up the Python bindings is in setup.py, which, according to your logs, does seem to correctly detect CUDA 11.8.
Like you say, it's only the build itself that seems to use the system's CUDA -- and this is parameterized by Python's setuptools
rather than ourselves (see the calls to CUDAExtension(...)
and setup(...)
). In their documentation (1, 2), I don't see a straightforward way to parameterize the build system itself.
As a workaround, you could try temporarily prepending your conda's CUDA bin path to your system's PATH variable and its lib path to your system's LD_LIBRARY_PATH and then removing those again once the build succeeded.
@Tom94 Thanks a lot for your reply, by checking the source code of torch/utils/cpp_extension.py
, I found it read the cuda path from both CUDA_HOME and CUDA_PATH:
def _find_cuda_home() -> Optional[str]:
r'''Finds the CUDA install path.'''
# Guess #1
cuda_home = os.environ.get('CUDA_HOME') or os.environ.get('CUDA_PATH')
Then I checked those two environmental variable, I found the CUDA_PATH mislead the complier:
$ echo $CUDA_PATH
/opt/cuda
By changing it to conda enviroment, I fixed the issue (but coming with other compling issues, not related to this topic)