lyogavin / Anima

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Macbook "Torch not compiled with CUDA enabled" Error

LanLanBoom opened this issue · comments

This is my code

from airllm import AutoModel

MAX_LENGTH = 128

model = AutoModel.from_pretrained("garage-bAInd/Platypus2-70B-instruct", compression='4bit', profiling_mode=True, delete_original=True)

input_text = [
    'What is the capital of United States?',
]
input_tokens = model.tokenizer(input_text,
                               return_tensors="pt",
                               return_attention_mask=False,
                               truncation=True,
                               max_length=MAX_LENGTH,
                               padding=False)

generation_output = model.generate(
    input_tokens['input_ids'].mps(),
    max_new_tokens=20,
    use_cache=True,
    return_dict_in_generate=True)

output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

This is the result I got

File "/Users/xxx/miniconda3/envs/torch-playground/lib/python3.9/site-packages/airllm/airllm_llama_mlx.py", line 224, in __init__
    self.model_local_path, self.checkpoint_path = find_or_create_local_splitted_path(model_local_path_or_repo_id,
  File "/Users/xxx/miniconda3/envs/torch-playground/lib/python3.9/site-packages/airllm/utils.py", line 382, in find_or_create_local_splitted_path
    return Path(hf_cache_path), split_and_save_layers(hf_cache_path, layer_shards_saving_path,
  File "/Users/xxx/miniconda3/envs/torch-playground/lib/python3.9/site-packages/airllm/utils.py", line 303, in split_and_save_layers
    layer_state_dict = compress_layer_state_dict(layer_state_dict, compression)
  File "/Users/xxx/miniconda3/envs/torch-playground/lib/python3.9/site-packages/airllm/utils.py", line 162, in compress_layer_state_dict
    v_quant, quant_state = bnb.functional.quantize_nf4(v.cuda(), blocksize=64)
  File "/Users/xxx/miniconda3/envs/torch-playground/lib/python3.9/site-packages/torch/cuda/__init__.py", line 289, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

I don't get it. Could anyone help me?

same output... seems not resolved.

--> 162 v_quant, quant_state = bnb.functional.quantize_nf4(v.cuda(), blocksize=64)

seems compression is not fit for mac