AssertionError when run generate.ipynb with default parameter

Question

AssertionError when run generate.ipynb with default parameter

jacquesqiao opened this issue 10 months ago · comments

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[14], line 2
      1 if use_mingpt:
----> 2     model = GPT.from_pretrained(model_type)
      3 else:
      4     model = GPT2LMHeadModel.from_pretrained(model_type)

File ~/project/llm/minGPT/mingpt/model.py:200, in GPT.from_pretrained(cls, model_type)
    197 transposed = ['attn.c_attn.weight', 'attn.c_proj.weight', 'mlp.c_fc.weight', 'mlp.c_proj.weight']
    198 # basically the openai checkpoints use a "Conv1D" module, but we only want to use a vanilla nn.Linear.
    199 # this means that we have to transpose these weights when we import them
--> 200 assert len(keys) == len(sd)
    201 for k in keys:
    202     if any(k.endswith(w) for w in transposed):
    203         # special treatment for the Conv1D weights we need to transpose

AssertionError:

Jingwei Huang · Answer 1 · Tue Aug 22 2023 17:17:06 GMT+0800 (China Standard Time)

Same problem here. Maybe huggingface updated their pretrained model? Did you find a solution?

Zhenhong · Answer 2 · Thu Aug 31 2023 12:38:00 GMT+0800 (China Standard Time)

I encountered the same problem, I found the problem was caused by the account of parameters. Then I compared the parameters of sd and sd_hf. The problem seems to be caused by hugging face update GPT2Attentionsource code
I add self.register_buffer("masked_bias", torch.tensor(-1e4), persistent=False) in Model.py, then solve it!

Jason Vasquez · Answer 3 · Wed Nov 01 2023 08:30:50 GMT+0800 (China Standard Time)

Where did you add that line of code in model.py?

Todd Morrill · Answer 4 · Tue Dec 26 2023 06:09:45 GMT+0800 (China Standard Time)

My fix was the following in model.py.

# attn.bias isn't in the hugging face state dict, so we can't check for it
assert len(keys) == len([k for k in sd if not k.endswith('.attn.bias')])