marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AutoModelForCausalLM.from_pretrained(.., gpu_layers=..) gives Windows Error 0xc000001d

JeremyBickel opened this issue · comments

  Any use of "gpu_layers" crashes it.  

CUDA is working:

(ct) C:\Users\Jeremy\Documents>python
Python 3.11.6 (tags/v3.11.6:8b6ee5b, Oct 2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import torch
torch.cuda.is_available()
True


I just made a new python environment:

python -m venv python_envs\llamacpp
python_envs\llamacpp\Scripts\activate

git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git
python -m pip install --upgrade pip
cd llama-cpp-python
pip install -e .[all]

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

pip install ctransformers[cuda]

~~~THEN~~~

(llamacpp) C:\Users\Jeremy\Documents>python process_dataset.py
Fetching 1 files: 100%|██████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s]
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 999.12it/s]
Traceback (most recent call last):
  File "C:\Users\Jeremy\Documents\process_dataset.py", line 46, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_path, gpu_layers=10, model_file="xwin-lm-13b-v0.2.Q6_K.gguf")#, model_type='gguf', max_new_tokens=3500, repetition_penalty=1.07, temperature=0.1, top_k=15, top_p=0.97, last_n_tokens=40, seed=142857, stream=False, reset=False, batch_size=512, threads=10, context_length=4096) #, gpu_layers=12
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jeremy\python_envs\llamacpp\Lib\site-packages\ctransformers\hub.py", line 175, in from_pretrained
    llm = LLM(
          ^^^^
  File "C:\Users\Jeremy\python_envs\llamacpp\Lib\site-packages\ctransformers\llm.py", line 247, in __init__
    self._llm = self._lib.ctransformers_llm_create(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError -1073741795] Windows Error 0xc000001d

~~~THEN~~~

I edited the offending line to remove "gpu_layers=10,", and it worked with the cpu-only setup:

(llamacpp) C:\Users\Jeremy\Documents>python process_dataset.py
Fetching 1 files: 100%|████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1010.68it/s]
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 999.83it/s]
p>  Tag each word of the following SENTENCE using these semantic tags: agent, main-action, purpose, direct-object, theme, relative pronon, status-action, location, recipient, comparison marker, referential-agent, transitive-action, means marker, time-indicator, description, status, conjunction, theme. SENTENCE: And after these things I saw four angels standing on the four corners of the earth, holding the four winds of the earth, that the wind should not blow on the earth, nor on the sea, nor on any tree.

...





Same issue here. Windows 11, Cuda 11.8/12.3, Python 3.12/3.11, model llama-2-13b-chat.Q8_0.gguf, same output.

Update: Got it fixed. It turns out that my CPU does not support AVX2, so I cloned the repo and edited the CMAKE config to only use AVX and installed in that way. Then it's possible to run the model. Install CMake and take a look at guidance branch, the installation guide shows you how to do it.

P.S. I also got into problems of Cublas Error: 13. It turned out to be an error related to multiple GPUs, one has to specify which GPU to use, although the program prints that one has been selected. To do so, type this in PowerShell

$env:CUDA_VISIBLE_DEVICES=1

This command selects the 1-th GPU.