Fine tuning DistilBERT model OSError: Unable to load weights from pytorch checkpoint file.

Question

Fine tuning DistilBERT model OSError: Unable to load weights from pytorch checkpoint file.

learnercat opened this issue 4 years ago · comments

Hello! Thanks for the excellent tutorial of an awesome DistilBERT model. I learned and reproduced it successfully. I tried to load and predict this model "pytorch_distilbert_news.bin" with tokenizer "vocab_distilbert_news.bin" (I changed model name as "pytorch_model.bin" and tokenizer vocab as "vocab.txt" ). I wrote the scripts as below and tested it as below, but got an error and couldn't find solution.

# Importing the libraries needed
import pandas as pd
import torch
import transformers
from torch.utils.data import Dataset, DataLoader
from transformers import DistilBertModel, DistilBertTokenizer

test ="He'll give us really healthy competition with our goalkeepers and we wish him the very best of luck."

tokenizer = transformers.DistilBertTokenizer.from_pretrained('model/')
model= transformers.DistilBertModel.from_pretrained('model/')

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/anaconda3/envs/hgface/lib/python3.7/site-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
658 try:
--> 659 state_dict = torch.load(resolved_archive_file, map_location="cpu")
660 except Exception:
~/anaconda3/envs/hgface/lib/python3.7/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
579 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
--> 580 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
~/anaconda3/envs/hgface/lib/python3.7/site-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
759 unpickler.persistent_load = persistent_load
--> 760 result = unpickler.load()
AttributeError: Can't get attribute 'DistillBERTClass' on <module '__main__'>
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
<ipython-input-24-827530780d10> in <module>
----> 1 model= transformers.DistilBertModel.from_pretrained('model/')

~/anaconda3/envs/hgface/lib/python3.7/site-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
660 except Exception:
--> 662 "Unable to load weights from pytorch checkpoint file. " 663 "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. "
OSError: Unable to load weights from pytorch checkpoint file. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Please help. Thanks.

Abhishek Kumar Mishra · Answer 1 · Sun Jul 05 2020 13:08:43 GMT+0800 (China Standard Time)

Hi @learnercat , i am afraid the methods specified in the Hugging Face library cannot be directly used for saving and loading these models (specially in case of Multi-class classification) since that experiment has a custom network.

We have added a few more layers on top of the Distilbert model. Hence, in order to save the model you will have to rely on the model saving and loading methods native to pytorch library. Load and Save Pytorch models

For tokenizer the hugging face procedure should work.

I tested the torch.save() and torch.load() method and it worked for me. Let me know if this works or if you still get errors.

learnercat · Answer 2 · Mon Jul 06 2020 05:34:41 GMT+0800 (China Standard Time)

Hi @abhimishra91, Thank you very much. I tried it again with touch.load(), but didn't work.

The model and tokenizer were saved as your example :

output_model_file = './models/pytorch_distilbert_news.bin'
output_vocab_file = './models/vocab_distilbert_news.bin'
model_to_save = model
torch.save(model_to_save, output_model_file)
tokenizer.save_vocabulary(output_vocab_file)

Tokenizer loaded with hugging face procedure, but there is the warning. Then model loading has got another AttributeError. Here is the scripts and error details:

# Importing the libraries needed
import torch
import transformers
import numpy as np
from torch.utils.data import Dataset, DataLoader
from transformers import DistilBertModel, DistilBertTokenizer, DistilBertForSequenceClassification

tok_path = "models/vocab_distilbert_news.bin"
tokenizer = transformers.DistilBertTokenizer.from_pretrained(tok_path)
inputs = tokenizer.convert_tokens_to_ids("He'll give us really healthy competition with our goalkeepers and we wish him the very best of luck.")

"Calling DistilBertTokenizer.from_pretrained() with the path to a single file or url is deprecated"

model_path="models/pytorch_distilbert_news.bin"
model = torch.load(model_path)
outputs = model(inputs)
last_hidden_states = outputs[0]

Here are errors-->

AttributeError Traceback (most recent call last)
<ipython-input-29-48088a60bf09> in <module>
1 model_path="models/pytorch_distilbert_news.bin"
----> 2 model = torch.load(model_path)
3 outputs = model(inputs)
4 last_hidden_states = outputs[0]
~/anaconda3/envs/hgface/lib/python3.7/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
578 return torch.jit.load(f)
579 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
--> 580 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
~/anaconda3/envs/hgface/lib/python3.7/site-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
758 unpickler = pickle_module.Unpickler(f, **pickle_load_args)
759 unpickler.persistent_load = persistent_load
--> 760 result = unpickler.load()
762 deserialized_storage_keys = pickle_module.load(f, **pickle_load_args)
AttributeError: Can't get attribute 'DistillBERTClass' on <module '__main__'>

I am not sure that my scripts might be wrong in tokenizer step. Please check it. Can I look your load and test function example? Thanks in advance.

Abhishek Kumar Mishra · Answer 3 · Thu Jul 09 2020 11:12:40 GMT+0800 (China Standard Time)

Hi @learnercat , apologies i have been tied up with another project that i am working on.

I will be able to share a script with you by the end of this weekend. Hope that does not hamper your learning too much.

learnercat · Answer 4 · Fri Jul 10 2020 21:49:44 GMT+0800 (China Standard Time)

Hi @abhimishra91 , thank you so much for your kind support. I have just solved the problem. Let me share it.
After I retrained it, I saved model and vocab files as below:

model_to_save = model.module if hasattr(model, 'module') else model
torch.save(model_to_save.state_dict(), output_model_file)
tokenizer.save_vocabulary(output_vocab_file)

I input DistillBERTClass from training.ipynb to predict.ipynb;
!pip install import-ipynb
import import_ipynb
from training import DistillBERTClass

Input to tokenizer -->
tok_path = "models/vocab_distilbert_news.bin"
tokenizer = transformers.DistilBertTokenizer.from_pretrained(tok_path)
inputs = tokenizer.encode_plus("Hello, my dog is cute", return_tensors="pt", add_special_tokens=True)

Input to model -->
model_path="./models/pytorch_distilbert_news.bin"
model = DistillBERTClass()
model.to(device)
model.load_state_dict(torch.load(model_path))
model.eval()

Create predict function -->
def predict(model, inputs):
model.eval()
with torch.no_grad():
. ids = inputs['input_ids'].to(device, dtype = torch.long)
. mask = inputs['attention_mask'].to(device, dtype = torch.long)
. outputs = model(ids, mask).squeeze()
big_val, big_idx = torch.max(outputs.data, dim=1)
return big_idx[0].item()

result = predict(model, inputs)
print(result)
1
I learned a lot. Again thank you very much. Have a great weekend!

Abhishek Kumar Mishra · Answer 5 · Sat Jul 11 2020 15:09:28 GMT+0800 (China Standard Time)

Yup, this is great stuff.

You can also reduce a line of code by specifying the device when loading the model dict with the following line:

model.load_state_dict(torch.load(model_path), map_location=device)

igormis · Answer 6 · Wed May 12 2021 20:34:19 GMT+0800 (China Standard Time)

Hi @learnercat I did the same thing as you did but I get the following error:
when I try to predict

--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

TypeError: forward() missing 1 required positional argument: 'token_type_ids'