lyogavin / Anima

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AirLLMLlamaMlx fails to load model with mlx==0.0.7

jakule opened this issue · comments

The sample code (taken from AirLLM examples):

from airllm import AirLLMLlamaMlx
import mlx.core as mx

MAX_LENGTH = 128
model = AirLLMLlamaMlx("garage-bAInd/Platypus2-7B")

input_text = [
    'I like',
]

input_tokens = model.tokenizer(input_text,
                               return_tensors="np",
                               return_attention_mask=False,
                               truncation=True,
                               max_length=MAX_LENGTH,
                               padding=False)

generation_output = model.generate(
    mx.array(input_tokens['input_ids']),
    max_new_tokens=3,
    use_cache=True,
    return_dict_in_generate=True)

print(generation_output)

fails to load the model with

File ~/venv/lib/python3.11/site-packages/airllm/persist/mlx_model_persister.py:96, in MlxModelPersister.load_model(self, layer_name, path)
     94 #available = psutil.virtual_memory().available / 1024 / 1024
     95 #print(f"start loading: {to_load_path}, before loading: {available:.02f}")
---> 96 layer_state_dict = mx.load(to_load_path)
     97 #available = psutil.virtual_memory().available / 1024 / 1024
     98 #print(f"loaded {layer_name}, available mem: {available:.02f}")
    100 layer_state_dict = map_torch_to_mlx(layer_state_dict)

ValueError: [load] Input must be a file-like object, or string

after upgrading to mlx 0.0.7. It works fine with 0.0.6.