RuntimeError: Error(s) in loading state_dict for LlamaModel: size mismatch for model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]).

Question

RuntimeError: Error(s) in loading state_dict for LlamaModel: size mismatch for model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]).

Hasan-Syed25 opened this issue 2 months ago · comments

Hi! Need a little help here with loading a pretrained model using AutoModel.from_pretrained.

I pretrained a Tiny-llama model with the following model_config.yaml

bias: false
block_size: 2048
gelu_approximate: none
head_size: 48
hf_config:
name="TinyLlama-0.5B": null
intermediate_size: 2048
lm_head_bias: false
mlp_class_name: LLaMAMLP
n_embd: 2048
n_expert: 0
n_expert_per_token: 0
n_head: 16
n_layer: 24
n_query_groups: 4
name: tinyllama500m
norm_class_name: RMSNorm
norm_eps: 1.0e-05
padded_vocab_size: 32256
padding_multiple: 512
parallel_residual: false
rope_base: 10000
rope_condense_ratio: 1
rotary_percentage: 1.0
scale_embeddings: false
shared_attention_norm: false
vocab_size: 32000

I first converted the pretrained model weights to lit_model.pth using the following 2 commands:

!litgpt convert pretrained_checkpoint
--checkpoint_dir ../Apex-500M
--output_dir checkpoints/tiny-llama/final

!litgpt convert from_litgpt
--checkpoint_dir ./checkpoints/tiny-llama/final
--output_dir converted_dir

I am trying to load the model with the following configuration and encountering size mismatch error:

import torch
from transformers import AutoModel

state_dict = torch.load('converted_dir/model.pth')
model = AutoModel.from_pretrained(
"TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T",
num_hidden_layers=24,
num_attention_heads=16,
hidden_size=2048,
vocab_size= 32256,
state_dict=state_dict,
)

ERROR:

RuntimeError: Error(s) in loading state_dict for LlamaModel:
size mismatch for model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([768, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]).

POSSIBLE EXPLANATION:

torch.Size([768, 2048]), the 768 value might be due to num_attention_heads(16)*head_size(48), but there's no param for head_size to be set in AutoModel.