920232796 / bert_seq2seq

utils.py下的load_model_params函数只加载了Bert的权重（3个embedding层以及12个transformer块），但是没有加载decoder层（比如seq2seq任务的BertLMPredictionHead）参数，这是为什么？
（推测加载后效果会更好。）

bert_seq2seq/bert_seq2seq/utils.py

Lines 42 to 50 in 74c5e36

    
           def load_model_params(model, pretrain_model_path): 
        
                   checkpoint = torch.load(pretrain_model_path) 
        
                   # 模型刚开始训练的时候, 需要载入预训练的BERT 
        
                   checkpoint = {k[5:]: v for k, v in checkpoint.items() 
        
                                                       if k[:4] == "bert" and "pooler" not in k} 
        
                   model.load_state_dict(checkpoint, strict=False) 
        
                   torch.cuda.empty_cache() 
        
                   print("{} loaded!".format(pretrain_model_path))

你可以试试，不过bert加载预训练模型参数的话，就是只加载这些，因为bert不只是做seq2seq，还会做别的任务，比如文本分类，这时候就根本不需要加载decoder层了吧～

好像有个unilm预训练模型应该就是跟你说的一样加载了你说的这些，我这个是使用的bert预训练模型。

好勒，谢谢，基本了解了~

ok

	def load_model_params(model, pretrain_model_path):

	checkpoint = torch.load(pretrain_model_path)
	# 模型刚开始训练的时候, 需要载入预训练的BERT
	checkpoint = {k[5:]: v for k, v in checkpoint.items()
	if k[:4] == "bert" and "pooler" not in k}
	model.load_state_dict(checkpoint, strict=False)
	torch.cuda.empty_cache()
	print("{} loaded!".format(pretrain_model_path))

预训练权重问题