Issue Running llamafile on Ubuntu Server with Dual RTX 3060 GPUs

Question

Issue Running llamafile on Ubuntu Server with Dual RTX 3060 GPUs

AluoExpiry opened this issue 2 months ago · comments

Hello,

I'm trying to run the llamafile project on an Ubuntu server equipped with two RTX 3060 GPUs. I have correctly installed the NVIDIA drivers and the CUDA environment. However, when I attempt to run llamafile, I encounter the following error:

import_cuda_impl: initializing gpu module… compile_nvidia: note: building ggml-cuda with nvcc -arch=native… llamafile_log_command: /usr/bin/nvcc -arch=native --shared --forward-unknown-to-host-compiler --compiler-options “-fPIC -O3 -march=native -mtune=native” -DNDEBUG -DGGML_BUILD=1 -DGGML_SHARED=1 -DGGML_CUDA_MMV_Y=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -o /home/aluo/.llamafile/ggml-cuda.so.zrmqrt /home/aluo/.llamafile/ggml-cuda.cu -lcublas -lcuda nvcc fatal : Value ‘native’ is not defined for option ‘gpu-architecture’ Compile: warning: /usr/bin/nvcc returned nonzero exit status get_nvcc_arch_flag: note: building nvidia compute capability detector… llamafile_log_command: /usr/bin/nvcc -o /home/aluo/.llamafile/compcap.zmapgc /home/aluo/.llamafile/compcap.cu llamafile_log_command: /home/aluo/.llamafile/compcap compile_nvidia: note: building ggml-cuda with nvcc -arch=compute_86… llamafile_log_command: /usr/bin/nvcc -arch=compute_86 --shared --forward-unknown-to-host-compiler --compiler-options “-fPIC -O3 -march=native -mtune=native” -DNDEBUG -DGGML_BUILD=1 -DGGML_SHARED=1 -DGGML_CUDA_MMV_Y=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -o /home/aluo/.llamafile/ggml-cuda.so.zrmqrt /home/aluo/.llamafile/ggml-cuda.cu -lcublas -lcuda /home/aluo/.llamafile/ggml-cuda.cu(364): warning #1305-D: function declared with “noreturn” does return

/home/aluo/.llamafile/ggml-cuda.cu(364): warning #1305-D: function declared with “noreturn” does return

/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘…’: 435 | function(_Functor&& __f) | ^ /usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’ /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘…’: 530 | operator=(_Functor&& __f) | ^ /usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’ /home/aluo/.llamafile/ggml-cuda.cu: In function ‘void ggml_cuda_error(const char*, const char*, const char*, int, const char*)’: /home/aluo/.llamafile/ggml-cuda.cu:364:1: warning: ‘noreturn’ function does return 364 | } | ^ /home/aluo/.llamafile/ggml-cuda.cu: In function ‘int64_t get_row_rounding(ggml_type, const std::array<float, 16>&)’: /home/aluo/.llamafile/ggml-cuda.cu:1152:18: warning: control reaches end of non-void function [-Wreturn-type] 1152 | GGML_ASSERT(false); | ~~~^ Compile: warning: /usr/bin/nvcc returned nonzero exit status extract_cuda_dso: note: prebuilt binary /zip/ggml-cuda.so not found fatal error: support for --gpu nvidia was explicitly requested, but it wasn’t available

I'm not sure what's causing this issue. The error message seems to suggest that there's a problem with the 'native' option for 'gpu-architecture'. I've tried looking into this, but I haven't been able to find a solution.

nvidia-smi outputs:

nvcc --version:

device info:
intel i5 12600kf
ASUS TUF RTX3060 12G
ASUS RTX3060 KO 12G
32G RAM
Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-102-generic x86_64)

use model:
WizardCoder-Python-34B

exec command:
./llamafile --gpu nvidia -ngl 35

Any help would be greatly appreciated!

Xing Shi Cai · Answer 1 · Wed Apr 24 2024 15:23:28 GMT+0800 (China Standard Time)

I have similar problem. I have a NVIDIA Corporation GA106 [RTX A2000 12GB]. And I have nvcc installed. But it seems that my GPU is not used.