Can't load models with CPU

Question

Can't load models with CPU

RBNXI opened this issue a year ago · comments

I used this docker-compose to try to run under CPU, but when loading model I get a CUDA version error:
text-generation-webui | === Running text-generation-webui variant: 'DEFAULT' === text-generation-webui | === (This version is 75 commits behind origin) === text-generation-webui | === Image build date: 2023-07-18 18:43:00 === text-generation-webui | /venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32 text-generation-webui | /venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. text-generation-webui | warn("The installed version of bitsandbytes was compiled without GPU support. " text-generation-webui | 2023-07-30 18:34:51 INFO:Loading ggml-model-q4_1.bin... text-generation-webui | CUDA error 35 at ggml-cuda.cu:2478: CUDA driver version is insufficient for CUDA runtime version text-generation-webui | /arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit text-generation-webui exited with code 1

Isn't this supposed to load without trying to use an Nvidia GPU? I have an AMD, I was trying to use the CPU instead but doesn't work...
I'm in Linux, do I need any extra steeps to do?.

Atinoda · Answer 1 · Mon Jul 31 2023 06:02:56 GMT+0800 (China Standard Time)

It should work fine (albeit very slowly) without a GPU. Did you check #9 and #13 to see if they apply to you?

Is this running on your local rig or a cloud instance?

SEVENID · Answer 2 · Fri Aug 04 2023 15:29:43 GMT+0800 (China Standard Time)

Same problem. I tried to use solutions from #9 and #13, no success. If fails even on first start with gpu-related enabled options:

text-generation-webui  | === Running text-generation-webui variant: 'DEFAULT' ===
text-generation-webui  | === (This version is 28 commits behind origin) ===
text-generation-webui  | === Image build date: 2023-08-01 17:57:43 ===
text-generation-webui  | /venv/lib/python3.10/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
text-generation-webui  |   return torch._C._cuda_getDeviceCount() > 0
text-generation-webui  | /venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
text-generation-webui  | /venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
text-generation-webui  |   warn("The installed version of bitsandbytes was compiled without GPU support. "
text-generation-webui  | Traceback (most recent call last):
text-generation-webui  |   File "/app/server.py", line 1174, in <module>
text-generation-webui  |     create_interface()
text-generation-webui  |   File "/app/server.py", line 811, in create_interface
text-generation-webui  |     create_model_menus()
text-generation-webui  |   File "/app/server.py", line 169, in create_model_menus
text-generation-webui  |     total_mem.append(math.floor(torch.cuda.get_device_properties(i).total_memory / (1024 * 1024)))
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 395, in get_device_properties
text-generation-webui  |     _lazy_init()  # will define _get_device_properties
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
text-generation-webui  |     torch._C._cuda_init()
text-generation-webui  | RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.
text-generation-webui exited with code 1

Or all gpu-related disabled options on model loading: #15

text-generation-webui  | CUDA error 35 at ggml-cuda.cu:2615: CUDA driver version is insufficient for CUDA runtime version
text-generation-webui  | /arrow/cpp/src/arrow/filesystem/s3fs.cc:2598:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit
text-generation-webui exited with code 1

Meanwhile standalone text-generation-webui has no such problem with cpu-only installation.

Atinoda · Answer 3 · Fri Aug 04 2023 21:31:42 GMT+0800 (China Standard Time)

Thanks for confirming the issue, and that it works correctly with a bare-metal installation. The cause is that llama-cpp-python has been made CUDA-forward in the original project. I have introduced a new docker variant, llama-cpu, to the images collection which should run .ggml models correctly in CPU-only mode. Standard Transformers models should also work correctly in CPU only mode (for all image variants).

Please test and let me know if it works for you!

Christoph Mayer · Answer 4 · Sat Sep 02 2023 12:13:19 GMT+0800 (China Standard Time)

Hi @Atinoda,
very new to this, just gave it a try on a "cpu only" instance - without any luck yet, might just be a user error though : ) maybe you can help me out?

pulled: atinoda/text-generation-webui:llama-cpu
...fowarded 7860:7860 -> webUi reachable.

...went to the "Model" section and downloaded: "TheBloke/Llama-2-7B-Chat-GGML"
... hit refresh, chose the model and kept the model loader on "llama.cpp"
... pressed "load" to load the model which resulted in the following docker logs:

my env args might help (i basically only added the "--listen" flag to it, are all those nvidia related flags still necessary?):

PATH=/venv/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
NVARCH=x86_64
NVIDIA_REQUIRE_CUDA=cuda>=11.8 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=510,driver<511 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=tesla,driver>=515,driver<516 brand=unknown,driver>=515,driver<516 brand=nvidia,driver>=515,driver<516 brand=nvidiartx,driver>=515,driver<516 brand=geforce,driver>=515,driver<516 brand=geforcertx,driver>=515,driver<516 brand=quadro,driver>=515,driver<516 brand=quadrortx,driver>=515,driver<516 brand=titan,driver>=515,driver<516 brand=titanrtx,driver>=515,driver<516
NV_CUDA_CUDART_VERSION=11.8.89-1
NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-8
CUDA_VERSION=11.8.0
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NV_CUDA_LIB_VERSION=11.8.0-1
NV_NVTX_VERSION=11.8.86-1
NV_LIBNPP_VERSION=11.8.0.86-1
NV_LIBNPP_PACKAGE=libnpp-11-8=11.8.0.86-1
NV_LIBCUSPARSE_VERSION=11.7.5.86-1
NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-8
NV_LIBCUBLAS_VERSION=11.11.3.6-1
NV_LIBCUBLAS_PACKAGE=libcublas-11-8=11.11.3.6-1
NV_LIBNCCL_PACKAGE_NAME=libnccl2
NV_LIBNCCL_PACKAGE_VERSION=2.15.5-1
NCCL_VERSION=2.15.5-1
NV_LIBNCCL_PACKAGE=libnccl2=2.15.5-1+cuda11.8
NVIDIA_PRODUCT_NAME=CUDA
NV_CUDA_CUDART_DEV_VERSION=11.8.89-1
NV_NVML_DEV_VERSION=11.8.86-1
NV_LIBCUSPARSE_DEV_VERSION=11.7.5.86-1
NV_LIBNPP_DEV_VERSION=11.8.0.86-1
NV_LIBNPP_DEV_PACKAGE=libnpp-dev-11-8=11.8.0.86-1
NV_LIBCUBLAS_DEV_VERSION=11.11.3.6-1
NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-11-8
NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-11-8=11.11.3.6-1
NV_CUDA_NSIGHT_COMPUTE_VERSION=11.8.0-1
NV_CUDA_NSIGHT_COMPUTE_DEV_PACKAGE=cuda-nsight-compute-11-8=11.8.0-1
NV_NVPROF_VERSION=11.8.87-1
NV_NVPROF_DEV_PACKAGE=cuda-nvprof-11-8=11.8.87-1
NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
NV_LIBNCCL_DEV_PACKAGE_VERSION=2.15.5-1
NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.15.5-1+cuda11.8
LIBRARY_PATH=/usr/local/cuda/lib64/stubs
VIRTUAL_ENV=/venv
PYTHONUNBUFFERED=1
BUILD_DATE=2023-08-28 17:20:26
VERSION_TAG=v1.5
EXTRA_LAUNCH_ARGS=--listen

below is the screenshot of my webinterface, maybe i have to add/tweak some settings?

thanks in advance, already excited to get your container up and running! :) CPU only would be a bliss

Philipp · Answer 5 · Sat Sep 02 2023 16:46:03 GMT+0800 (China Standard Time)

try llama-cpu-nightly

Christoph Mayer · Answer 6 · Sun Sep 03 2023 02:08:26 GMT+0800 (China Standard Time)

thanks alot @Philipp-Sc ! That did the trick... !
fun fact; on the containered nightly build i had the same "error" as in my self built version, to whoever might also stumble upon this, the field below in the new UI is not a random naming field ;)

but still works pretty well on deeply nested environments, CPU only

Atinoda · Answer 7 · Wed Mar 13 2024 01:30:06 GMT+0800 (China Standard Time)

Closing this issue because CPU variant is now a first class citizen.