Torch not compiled with CUDA enabled?
SeeJayDee opened this issue · comments
Issue:
FastChat WebUI gave empty response. Looks like an issue with Torch + Intel ARC / (non CUDA).
Initial debugging:
While testing, I killed all python3 processes on container, then attempted python3 -m fastchat.serve.cli [...]
.
Got following output:
# python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.3 --max-gpu-memory 14Gib
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>.
If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you.
If you want to use the new behaviour, set `legacy=False` .
This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:57<00:00, 28.83s/it]
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/fastchat/serve/cli.py", line 280, in <module>
main(args)
File "/usr/local/lib/python3.10/dist-packages/fastchat/serve/cli.py", line 206, in main
chat_loop(
File "/usr/local/lib/python3.10/dist-packages/fastchat/serve/inference.py", line 307, in chat_loop
model, tokenizer = load_model(
File "/usr/local/lib/python3.10/dist-packages/fastchat/model/model_adapter.py", line 291, in load_model
model.to(device)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2065, in to
return super().to(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Tried rolling back fastchat to 0.2.26 (was running 0.2.29):
pip3-autoremove "fschat[model_worker,webui]"
pip3 install "fschat[model_worker,webui]"==0.2.26
But to no avail.
Unfortunately this has me stumped despite following instructions from readme.
I also tried changing the docker run
command to include the following:
--device /dev/dxg
--volume=/usr/lib/wsl:/usr/lib/wsl
...as was done here (in the Windows section).
Neither that nor default settings worked.
System:
- Win 11 Pro / 22H2
- Docker Desktop 4.25.0 (using WSL2)
- i7-11700KF
- Arc A770 16GB
- 32GB RAM
Please pull the latest image from docker hub and dump the logs to a local folder -v /local/path:/logs
If you can post the output from the default startup I can try to see what's happening