lyogavin / Anima

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

chuangzhidan opened this issue · comments

(swift) root@gptai:/media/data/xp# python airllm_test.py
>>>> cache_utils installed
found index file...
found_layers:{'model.embed_tokens.': True, 'model.layers.0.': True, 'model.layers.1.': True, 'model.layers.2.': True, 'model.layers.3.': True
, 'model.layers.4.': True, 'model.layers.5.': True, 'model.layers.6.': True, 'model.layers.7.': True, 'model.layers.8.': True, 'model.layers.9.': True, 'model.layers.10.': True, 'model.layers.11.': True, 'model.layers.12.': True, 'model.layers.13.': True, 'model.layers.14.': True, 'model.layers.15.': True, 'model.layers.16.': True, 'model.layers.17.': True, 'model.layers.18.': True, 'model.layers.19.': True, 'model.layers.20.': True, 'model.layers.21.': True, 'model.layers.22.': True, 'model.layers.23.': True, 'model.layers.24.': True, 'model.layers.25.': True, 'model.layers.26.': True, 'model.layers.27.': True, 'model.layers.28.': True, 'model.layers.29.': True, 'model.layers.30.': True, 'model.layers.31.': True, 'model.layers.32.': True, 'model.layers.33.': True, 'model.layers.34.': True, 'model.layers.35.': True, 'model.layers.36.': True, 'model.layers.37.': True, 'model.layers.38.': True, 'model.layers.39.': True, 'model.layers.40.': True, 'model.layers.41.': True, 'model.layers.42.': False, 'model.layers.43.': False, 'model.layers.44.': False, 'model.layers.45.': False, 'model.layers.46.': False, 'model.layers.47.': False, 'model.layers.48.': False, 'model.layers.49.': False, 'model.layers.50.': False, 'model.layers.51.': False, 'model.layers.52.': False, 'model.layers.53.': False, 'model.layers.54.': False, 'model.layers.55.': False, 'model.layers.56.': False, 'model.layers.57.': False, 'model.layers.58.': False, 'model.layers.59.': False, 'model.layers.60.': False, 'model.layers.61.': False, 'model.layers.62.': False, 'model.layers.63.': False, 'model.layers.64.': False, 'model.layers.65.': False, 'model.layers.66.': False, 'model.layers.67.': False, 'model.layers.68.': False, 'model.layers.69.': False, 'model.layers.70.': False, 'model.layers.71.': False, 'model.layers.72.': False, 'model.layers.73.': False, 'model.layers.74.': False, 'model.layers.75.': False, 'model.layers.76.': False, 'model.layers.77.': False, 'model.layers.78.': False, 'model.layers.79.': False, 'model.norm.': False, 'lm_head.': False}
some layer splits found, some are not, re-save all layers in case there's some corruptions.
0%| | 0/83 [00:00<?, ?it/s]Loading shard 1/30
2%|██▊ | 2/83 [00:00<00:09, 8.75it/s]Loading shard 2/30
5%|█████▌ | 4/83 [00:00<00:07, 9.93it/s]Loading shard 3/30
10%|███████████▏ | 8/83 [00:00<00:06, 11.55it/s]Loading shard 4/30
12%|█████████████▊ | 10/83 [00:00<00:06, 11.96it/s]Loading shard 5/30
14%|████████████████▋ | 12/83 [00:01<00:05, 13.08it/s]Loading shard 6/30
19%|██████████████████████▏ | 16/83 [00:01<00:04, 13.99it/s]Loading shard 7/30
22%|████████████████████████▉ | 18/83 [00:01<00:04, 13.88it/s]Loading shard 8/30
27%|██████████████████████████████▍ | 22/83 [00:01<00:04, 12.61it/s]Loading shard 9/30
29%|█████████████████████████████████▎ | 24/83 [00:01<00:04, 12.30it/s]Loading shard 10/30
31%|████████████████████████████████████ | 26/83 [00:02<00:04, 11.53it/s]Loading shard 11/30
36%|█████████████████████████████████████████▌ | 30/83 [00:02<00:05, 9.72it/s]Loading shard 12/30
40%|█████████████████████████████████████████████▋ | 33/83 [00:02<00:05, 9.47it/s]Loading shard 13/30
43%|█████████████████████████████████████████████████▉ | 36/83 [00:03<00:04, 9.40it/s]Loading shard 14/30
47%|██████████████████████████████████████████████████████ | 39/83 [00:03<00:04, 9.05it/s]Loading shard 15/30
49%|████████████████████████████████████████████████████████▊ | 41/83 [00:03<00:04, 8.93it/s]Loading shard 16/30
52%|███████████████████████████████████████████████████████████▌ | 43/83 [00:04<00:04, 8.70it/s]saved as: /media/data/xgp/model/Meta-Llama-3-70B-Instruct/splitted_model/model.layers.42.safetensors
53%|████████████████████████████████████████████████████████████▉ | 44/83 [00:06<00:26, 1.50it/s]Loading shard 17/30
53%|████████████████████████████████████████████████████████████▉ | 44/83 [00:06<00:05, 7.21it/s]
Traceback (most recent call last):
File "airllm_test.py", line 7, in
**model = AutoModel.from_pretrained("/media/data/xp/model/Meta-Llama-3-70B-Instruct")
File "/root/anaconda3/envs/swift/lib/python3.8/site-packages/airllm/auto_model.py", line 54, in from_pretrained
return class_(pretrained_model_name_or_path, *inputs, ** kwargs)
File "/root/anaconda3/envs/swift/lib/python3.8/site-packages/airllm/airllm.py", line 9, in init
super(AirLLMLlama2, self).init(*args, kwargs)
File "/root/anaconda3/envs/swift/lib/python3.8/site-packages/airllm/airllm_base.py", line 104, in init
self.model_local_path, self.checkpoint_path = find_or_create_local_splitted_path(model_local_path_or_repo_id,
File "/root/anaconda3/envs/swift/lib/python3.8/site-packages/airllm/utils.py", line 351, in find_or_create_local_splitted_path
return Path(model_local_path_or_repo_id), split_and_save_layers(model_local_path_or_repo_id, layer_shards_saving_path,
File "/root/anaconda3/envs/swift/lib/python3.8/site-packages/airllm/utils.py", line 297, in split_and_save_layers
state_dict.update(load_file(to_load, device='cpu'))

File "/root/anaconda3/envs/swift/lib/python3.8/site-packages/safetensors/torch.py", line 311, in load_file
with safe_open(filename, framework="pt", device=device) as f:

safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

翻译成英语:第一次运行时,正在把每层序列化,但是进程自己断了(ssh没操作导致的断联),只序列化到一半,重新运行下面这个脚本时报这个错误。
The first time it ran, it was serializing each layer, but the process terminated on its own (due to SSH disconnection without any operation), only serializing halfway. When I tried to run the script again, it reported this error.

from airllm import AutoModel
MAX_LENGTH =128
model = AutoModel.from_pretrained("xxxxxxxxxxxx")
input_text = [
'xxxxxxxxxxxxxxx'
]
input_tokens = model.tokenizer(input_text,
return_tensors="pt",
return_attention_mask=False,
truncation=True,
max_length=MAX_LENGTH,
padding=False)

generation_output = model.generate(
input_tokens['input_ids'].cuda(),
max_new_tokens=20,
use_cache=True,
return_dict_in_generate=True)

output = model.tokenizer.decode(generation_output.sequences[0])
print(output)
对了,顺便问下这个不支持其他模型比如qwen1.5吗?