Error with Llama3: ValueError: Trying to set a tensor of shape torch.Size([1024, 8192]) in "weight" (which has shape torch.Size([8192, 8192])), this look incorrect.
Cangshanqingshi opened this issue · comments
Cangshanqingshi commented
My server can't fetch the model from huggingfacce online, so I downloaded the pytorch version model instead of the safetensors version from huggingfacce to run it locally. For this reason, I change the code whose aim is loadding model . After this change the code is as below:
from airllm import AutoModel
MAX_LENGTH = 128
# could use hugging face model repo id:
# model = AutoModel.from_pretrained("garage-bAInd/Platypus2-70B-instruct")
# or use model's local path...
model = AutoModel.from_pretrained("./llama3_model", layer_shards_saving_path="./")
input_text = [
'What is the capital of United States?',
#'I like',
]
input_tokens = model.tokenizer(input_text,
return_tensors="pt",
return_attention_mask=False,
truncation=True,
max_length=MAX_LENGTH,
padding=False)
generation_output = model.generate(
input_tokens['input_ids'].cuda(),
max_new_tokens=20,
use_cache=True,
return_dict_in_generate=True)
output = model.tokenizer.decode(generation_output.sequences[0])
print(output)
When I run this code, the model can be loaded successfuly. But an accident with the shape of tensor takes place. The error message is as below:
found index file...
found_layers:{'model.embed_tokens.': True, 'model.layers.0.': True, 'model.layers.1.': True, 'model.layers.2.': True, 'model.layers.3.': True, 'model.layers.4.': True, 'model.layers.5.': True, 'model.layers.6.': True, 'model.layers.7.': True, 'model.layers.8.': True, 'model.layers.9.': True, 'model.layers.10.': True, 'model.layers.11.': True, 'model.layers.12.': True, 'model.layers.13.': True, 'model.layers.14.': True, 'model.layers.15.': True, 'model.layers.16.': True, 'model.layers.17.': True, 'model.layers.18.': True, 'model.layers.19.': True, 'model.layers.20.': True, 'model.layers.21.': True, 'model.layers.22.': True, 'model.layers.23.': True, 'model.layers.24.': True, 'model.layers.25.': True, 'model.layers.26.': True, 'model.layers.27.': True, 'model.layers.28.': True, 'model.layers.29.': True, 'model.layers.30.': True, 'model.layers.31.': True, 'model.layers.32.': True, 'model.layers.33.': True, 'model.layers.34.': True, 'model.layers.35.': True, 'model.layers.36.': True, 'model.layers.37.': True, 'model.layers.38.': True, 'model.layers.39.': True, 'model.layers.40.': True, 'model.layers.41.': True, 'model.layers.42.': True, 'model.layers.43.': True, 'model.layers.44.': True, 'model.layers.45.': True, 'model.layers.46.': True, 'model.layers.47.': True, 'model.layers.48.': True, 'model.layers.49.': True, 'model.layers.50.': True, 'model.layers.51.': True, 'model.layers.52.': True, 'model.layers.53.': True, 'model.layers.54.': True, 'model.layers.55.': True, 'model.layers.56.': True, 'model.layers.57.': True, 'model.layers.58.': True, 'model.layers.59.': True, 'model.layers.60.': True, 'model.layers.61.': True, 'model.layers.62.': True, 'model.layers.63.': True, 'model.layers.64.': True, 'model.layers.65.': True, 'model.layers.66.': True, 'model.layers.67.': True, 'model.layers.68.': True, 'model.layers.69.': True, 'model.layers.70.': True, 'model.layers.71.': True, 'model.layers.72.': True, 'model.layers.73.': True, 'model.layers.74.': True, 'model.layers.75.': True, 'model.layers.76.': True, 'model.layers.77.': True, 'model.layers.78.': True, 'model.layers.79.': True, 'model.norm.': True, 'lm_head.': True}
saved layers already found in splitted_model
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
either BetterTransformer or attn_implementation='sdpa' is available, creating model directly
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
either BetterTransformer or attn_implementation='sdpa' is available, creating model directly
running layers(self.running_device): 1%| | 1/83 [00:00<00:35, 2.29it/s]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[3], [line 22](vscode-notebook-cell:?execution_count=3&line=22)
[10](vscode-notebook-cell:?execution_count=3&line=10) input_text = [
[11](vscode-notebook-cell:?execution_count=3&line=11) 'What is the capital of United States?',
[12](vscode-notebook-cell:?execution_count=3&line=12) #'I like',
[13](vscode-notebook-cell:?execution_count=3&line=13) ]
[15](vscode-notebook-cell:?execution_count=3&line=15) input_tokens = model.tokenizer(input_text,
[16](vscode-notebook-cell:?execution_count=3&line=16) return_tensors="pt",
[17](vscode-notebook-cell:?execution_count=3&line=17) return_attention_mask=False,
[18](vscode-notebook-cell:?execution_count=3&line=18) truncation=True,
[19](vscode-notebook-cell:?execution_count=3&line=19) max_length=MAX_LENGTH,
[20](vscode-notebook-cell:?execution_count=3&line=20) padding=False)
---> [22](vscode-notebook-cell:?execution_count=3&line=22) generation_output = model.generate(
[23](vscode-notebook-cell:?execution_count=3&line=23) input_tokens['input_ids'].cuda(),
[24](vscode-notebook-cell:?execution_count=3&line=24) max_new_tokens=20,
[25](vscode-notebook-cell:?execution_count=3&line=25) use_cache=True,
[26](vscode-notebook-cell:?execution_count=3&line=26) return_dict_in_generate=True)
[28](vscode-notebook-cell:?execution_count=3&line=28) output = model.tokenizer.decode(generation_output.sequences[0])
[30](vscode-notebook-cell:?execution_count=3&line=30) print(output)
File /data2/lhy/anaconda3/envs/mm2024-ChinaOpen/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
[112](https://vscode-remote+ssh-002dremote-002b10-002e77-002e110-002e126.vscode-resource.vscode-cdn.net/data2/lhy/anaconda3/envs/mm2024-ChinaOpen/lib/python3.10/site-packages/torch/utils/_contextlib.py:112) @functools.wraps(func)
[113](https://vscode-remote+ssh-002dremote-002b10-002e77-002e110-002e126.vscode-resource.vscode-cdn.net/data2/lhy/anaconda3/envs/mm2024-ChinaOpen/lib/python3.10/site-packages/torch/utils/_contextlib.py:113) def decorate_context(*args, **kwargs):
[114](https://vscode-remote+ssh-002dremote-002b10-002e77-002e110-002e126.vscode-resource.vscode-cdn.net/data2/lhy/anaconda3/envs/mm2024-ChinaOpen/lib/python3.10/site-packages/torch/utils/_contextlib.py:114) with ctx_factory():
...
[349](https://vscode-remote+ssh-002dremote-002b10-002e77-002e110-002e126.vscode-resource.vscode-cdn.net/data2/lhy/anaconda3/envs/mm2024-ChinaOpen/lib/python3.10/site-packages/accelerate/utils/modeling.py:349) if dtype is None:
[350](https://vscode-remote+ssh-002dremote-002b10-002e77-002e110-002e126.vscode-resource.vscode-cdn.net/data2/lhy/anaconda3/envs/mm2024-ChinaOpen/lib/python3.10/site-packages/accelerate/utils/modeling.py:350) # For compatibility with PyTorch load_state_dict which converts state dict dtype to existing dtype in model
[351](https://vscode-remote+ssh-002dremote-002b10-002e77-002e110-002e126.vscode-resource.vscode-cdn.net/data2/lhy/anaconda3/envs/mm2024-ChinaOpen/lib/python3.10/site-packages/accelerate/utils/modeling.py:351) value = value.to(old_value.dtype)
ValueError: Trying to set a tensor of shape torch.Size([1024, 8192]) in "weight" (which has shape torch.Size([8192, 8192])), this look incorrect.
I wanna know how to address this issue.