itlackey / ipex-arc-fastchat

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Torch not compiled with CUDA enabled?

SeeJayDee opened this issue · comments

Issue:

FastChat WebUI gave empty response. Looks like an issue with Torch + Intel ARC / (non CUDA).

Initial debugging:

While testing, I killed all python3 processes on container, then attempted python3 -m fastchat.serve.cli [...].

Got following output:
# python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.3 --max-gpu-memory 14Gib
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. 
If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you.
If you want to use the new behaviour, set `legacy=False` .
This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:57<00:00, 28.83s/it]
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals) 
  File "/usr/local/lib/python3.10/dist-packages/fastchat/serve/cli.py", line 280, in <module>  
    main(args)  
  File "/usr/local/lib/python3.10/dist-packages/fastchat/serve/cli.py", line 206, in main 
    chat_loop(  
  File "/usr/local/lib/python3.10/dist-packages/fastchat/serve/inference.py", line 307, in chat_loop 
    model, tokenizer = load_model(
  File "/usr/local/lib/python3.10/dist-packages/fastchat/model/model_adapter.py", line 291, in load_model
    model.to(device)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2065, in to
    return super().to(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Tried rolling back fastchat to 0.2.26 (was running 0.2.29):
pip3-autoremove "fschat[model_worker,webui]"
pip3 install "fschat[model_worker,webui]"==0.2.26

But to no avail.

Unfortunately this has me stumped despite following instructions from readme.
I also tried changing the docker run command to include the following:
--device /dev/dxg
--volume=/usr/lib/wsl:/usr/lib/wsl
...as was done here (in the Windows section).
Neither that nor default settings worked.

System:

  • Win 11 Pro / 22H2
  • Docker Desktop 4.25.0 (using WSL2)
  • i7-11700KF
  • Arc A770 16GB
  • 32GB RAM

Please pull the latest image from docker hub and dump the logs to a local folder -v /local/path:/logs
If you can post the output from the default startup I can try to see what's happening