[BUG] Pretrained GPT2 model has an incorrect size compared with the config file.
alphaGem opened this issue · comments
Describe the bug
File "example.py", line 303, in <module>
main()
File "example.py", line 93, in main
gpt = GPT2.from_pretrained("gpt2-base", config=gpt_config)
File "/home/chenyanxu/miniconda3/envs/BMTrain/lib/python3.8/site-packages/model_center/model/basemodel.py", line 33, in from_pretrained
bmt.load(model, os.path.join(path, 'pytorch_model.pt'), strict=False)
File "/home/chenyanxu/miniconda3/envs/BMTrain/lib/python3.8/site-packages/bmtrain-0.1.8-py3.8-linux-x86_64.egg/bmtrain/store.py", line 202, in load
ret = model.load_state_dict(
File "/home/chenyanxu/miniconda3/envs/BMTrain/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2:
size mismatch for input_embedding.weight: copying a param with shape torch.Size([50258, 768]) from checkpoint, the shape in current model is torch.Size([38597376]).
Minimal steps to reproduce
gpt_config = GPT2Config.from_pretrained("gpt2-base")
gpt = GPT2.from_pretrained("gpt2-base", config=gpt_config)
Expected behavior
Successfully loads the model.
Environment:
model-center 0.1.3, torch 0.11.0, cuda 10.2
Additional information
If I change the vocab size in my local cached config to 50258, it seems to load correctly.
However, it seems that if I change the vocab size in my local cached config to 50258, the model doesn't work correctly because outputs[:,:,-1]
are all zeros, which are significantly larger than all other values. Using slice outputs[:,:,:-1]
as the real output seems to solve the problem.
This is caused by the new release ,you can pull the latest code and and it would work like you expect.Also make sure you delete the .cache/model_center/gpt2-base/ dir because we updated the config json on cloud too.
I have tried the following actions respectively:
pip uninstall model-center
and thenpip install model-center
pip uninstall model-center
; then clone the latest code and runpython3 setup.py install
in the code folder
Before each time I try to run my code, I delete the ~/.cache/model_center
folder.
However, none of the above actions solves the problem.
Are you sure that the pre-trained gpt-2 base model on cloud (the download path in utils/net_utils.py
is https://openbmb.oss-cn-hongkong.aliyuncs.com/model_center/{path}
as far as I can see) has a correct vocab size of 50257 instead of 50258?
I have tried the following actions respectively:
pip uninstall model-center
and thenpip install model-center
pip uninstall model-center
; then clone the latest code and runpython3 setup.py install
in the code folderBefore each time I try to run my code, I delete the
~/.cache/model_center
folder.However, none of the above actions solves the problem.
Are you sure that the pre-trained gpt-2 base model on cloud (the download path in
utils/net_utils.py
ishttps://openbmb.oss-cn-hongkong.aliyuncs.com/model_center/{path}
as far as I can see) has a correct vocab size of 50257 instead of 50258?
Sorry, We didn't update the checkpoint on the cloud before,which is not compatible with the config json.The vocab size should be 50257 ,and the checkpoint before has a extra dim with all zeros.Now the issue is fixed, you can clean the .cache/checkpoint and redownload the correct checkpoint by using from_pretrained method.
This issue has been fixed