Cuda is not available

Question

Cuda is not available

filevich opened this issue 2 years ago · comments

CPU only works fine; but when I run

gotch.NewCuda().CudaIfAvailable()

I get "Cuda is not available."

Can anybody help me out?

I have very carefully followed every step in the installation guide. And even got the creating $GOTCH_LIB_FILE for GPU message at the end of the installation process.

OS: Ubuntu 20.04
GPU: RTX 3060

/usr/local/cuda-11.3 ✅✅

$ ls /usr/local/cuda/include | grep cudnn
cudnn.h

$ ls /usr/local/cuda/lib64 | grep cudnn  
libcudnn_adv_infer.so
libcudnn_adv_infer.so.8
(...)
libcudnn_static.a
libcudnn_static_v8.a

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0

$ echo $LD_LIBRARY_PATH;
$ echo $CUDA_VERSION;
$ echo $CUDA_VERSION;
$ echo $CU_VERSION;
$ echo $GOTCH_LIBTORCH;
$ echo $LIBRARY_PATH;
$ echo $CPATH;
$ echo $LD_LIBRARY_PATH;

/usr/local/cuda-11.3/lib64::/usr/local/lib/libtorch/lib:/usr/lib64-nvidia:/usr/local/cuda-11.3/lib64
11.3
11.3

/usr/local/lib/libtorch
:/usr/local/lib/libtorch/lib
:/usr/local/lib/libtorch/lib:/usr/local/lib/libtorch/include:/usr/local/lib/libtorch/include/torch/csrc/api/include
/usr/local/cuda-11.3/lib64::/usr/local/lib/libtorch/lib:/usr/lib64-nvidia:/usr/local/cuda-11.3/lib64

$ nvidia-smi

Tue Aug 23 00:56:51 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0  On |                  N/A |
|  0%   39C    P8    19W / 170W |    283MiB / 12288MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1049      G   /usr/lib/xorg/Xorg                 29MiB |
|    0   N/A  N/A      1567      G   /usr/lib/xorg/Xorg                 97MiB |
|    0   N/A  N/A      1701      G   /usr/bin/gnome-shell               40MiB |
|    0   N/A  N/A      2081      G   ...867554538896762554,131072       65MiB |
|    0   N/A  N/A     31849      G   ...RendererForSitePerProcess       39MiB |
+-----------------------------------------------------------------------------+

One interesting thing:

when I run the go program with go run *.go it prints the error message "Cuda is not available." as I mentioned before; but when I run go build *.go && ./main it gets stuck. No output, nothing.

nvidia drivers + CUDA 11.3 + cudnn were installed using this gist / script in a fresh Ubuntu 20.04 partition

then installed libtorch according to the README guide using export CUDA_VER=11.3 && bash setup-libtorch.sh

and finally gotch using export CUDA_VER=11.3 && export GOTCH_VER=v0.7.0 && bash setup-gotch.sh

no errors linking nor compiling. everything just executed fine as supposed.

I have already tried with Ubuntu 22.04 + CUDA 11.7 + libtorch 1.12 (not 1.11) but wouldn't even compile.
Maybe I'll try Ubuntu 18 + CUDA 10.2
Or downgrading nvidia drivers

🤷‍♂️🤷‍♂️

Any help appreciated

sugarme · Answer 1 · Tue Aug 23 2022 15:52:45 GMT+0800 (China Standard Time)

Hi @filevich,

I don't have any machines with CUDA 11.3 now. However, please have a look at Google colab I setup with Gotch and CUDA 11.3 here.

Maybe you should delete libtorch at /usr/local/lib/libtorch and resinstall. Also, try to use clang instead of gcc for c compiler as in the Google colab.

try

package main

import (
	"fmt"

	"github.com/sugarme/gotch"
	"github.com/sugarme/gotch/ts"
)

func main() {
	device := gotch.CudaIfAvailable()

	fmt.Println(device)

	x := ts.MustOnes([]int64{3, 4, 5}, gotch.Double, device)

	fmt.Printf("%i", x)
}

I am using CUDA 11.1 with one of RTX 3060 and running just okay.