junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu

DovydasSunnus opened this issue · comments

The prebuilt Docker image works - but contains an older version of the code. Now I built a new image, but it aborts with this error on startup:

python train.py --dataroot /mnt/sfs_turbo/training-images --name arma_cyclegan --preprocess scale_width_and_crop --load_size 1080 --crop_size 360 --model cycle_gan

Traceback (most recent call last): File "/workspace/pytorch-CycleGAN-and-pix2pix/train.py", line 28, in <module> opt = TrainOptions().parse() # get training options File "/workspace/pytorch-CycleGAN-and-pix2pix/options/base_options.py", line 134, in parse torch.cuda.set_device(opt.gpu_ids[0]) File "/miniconda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 313, in set_device torch._C._cuda_setDevice(device) File "/miniconda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.

`root@60c73da267ab:/workspace/pytorch-CycleGAN-and-pix2pix# nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.226.00 Driver Version: 418.226.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:21:01.0 Off | 0 |
| N/A 32C P0 35W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
`

Have no clue:(

Got it working now with CUDA 11.4.

Tesla T4
NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4

Dockerfile:

FROM nvidia/cuda:11.4.0-base

RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub

RUN apt update && apt install -y wget unzip curl bzip2 git
RUN curl -LO http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
RUN bash Miniconda3-latest-Linux-x86_64.sh -p /miniconda -b
RUN rm Miniconda3-latest-Linux-x86_64.sh
ENV PATH=/miniconda/bin:${PATH}
RUN conda update -y conda

RUN conda install -y pytorch torchvision -c pytorch
RUN mkdir /workspace/ && cd /workspace/ && git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.git && cd pytorch-CycleGAN-and-pix2pix && pip install -r requirements.txt

WORKDIR /workspace