Error invalid device ordinal at line 393
matt-seb-ho opened this issue · comments
Hi, thanks for releasing and supporting this package! I think the results are super impressive, so I'm trying to get the quantization benefits for my own projects. I am trying QLoRA on Llama-7B. Using a slightly modified finetune.sh
script, I'm hitting the following error:
...
Adding special tokens.
adding LoRA modules...
loaded model
Splitting train dataset in train and validation according to `eval_dataset_size`
Found cached dataset json (/mnt/hdd/msho/.cache/huggingface/datasets/json/default-449d839f7091c29e/0.0.0/e347ab1c932
092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 494.09it/s]
trainable params: 79953920.0 || all params: 3660328960 || trainable: 2.1843370056007205
torch.float32 422326272 0.11537932153507864
torch.uint8 3238002688 0.8846206784649213
0%| | 0/10000 [00:00<?, ?it/s]
Error invalid device ordinal at line 393 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/pythonInterf
ace.c
I'm working with a server with CUDA 11.7 (top of nvidia-smi
readout:)
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
and I'm fairly certain I have the right dependencies. My pip freeze
matches requirements.txt
in all places except bitsandbytes
where I'm using a slightly newer version (0.41) because 0.40 fails on import for me (detects CUDA 10 for some reason).
bitsandbytes==0.41.1
transformers==4.31.0
peft==0.4.0
accelerate==0.21.0
einops==0.6.1
evaluate==0.4.0
scikit-learn==1.2.2
sentencepiece==0.1.99
wandb==0.15.3
I'm aware that there was a similar issue in the past (#3) but it seems that was resolved, so I'm not sure why I am facing a similar problem.
Any suggestions?
Thanks in advance!