microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Home Page:https://aka.ms/GeneralAI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fail to load MiniLM from transformers.

SefaZeng opened this issue · comments

Describe
Model I am using MiniLM: The script is as follows

from fairseq.models.roberta import XLMRModel
from transformers import AutoTokenizer, AutoModel
xlmr = AutoModel.from_pretrained("../pretrain_models/xlmr_l6h384")
tokenizer = AutoTokenizer.from_pretrained("../pretrain_models/xlmr_l6h384")
xlmr.eval()

But it seems the architecture of XLMR and MiniLM is not the same, and there are some warnings about this:

Some weights of the model checkpoint at ../pretrain_models/xlmr_l6h384 were not used when initializing XLMRobertaModel: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias']

  • This IS expected if you are initializing XLMRobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing XLMRobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of XLMRobertaModel were not initialized from the model checkpoint at ../pretrain_models/xlmr_l6h384 and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']

Do I need to change the transformer version?

Please repost the issue in the https://github.com/huggingface/transformers repo

Please repost the issue in the https://github.com/huggingface/transformers repo

I mean, even following the scripts of README in MiniLM, there is still these warnings. Is there something wrong with the example in MiniLM? Or maybe I just need to ignore it?

Change AutoModel to RobertaForMaskedLM solved this.