Problem loading checkpoint for AspectPolarityClassification

Question

Problem loading checkpoint for AspectPolarityClassification

zedavid opened this issue 8 months ago · comments

Please provide the REQUIRED information. Otherwise, It is almost impossible to locate the problem. DO NOT CHANGE THE FORM.

PyABSA Version (Required)

Python Version: 3.11.15
PyABSA Version: 2.1.6
Torch Version: 2.1.1+cu118
Transformers Version: 4.35.2

Describe the bug

I'm trying to run the recipe to perform aspect level classification following the steps of the documentation tutorial. When loading the model from the trained checkpoint the state_dict file does not contain the same keys the model architecture is expecting.

Code To Reproduce (Required)

from transformers import AutoModel
import torch

pyabsa_model = AutoModel.from_pretrained('yangheng/deberta-v3-base-absa-v1.1')
pyabsa_model.load_state_dict(state_dict=torch.load('fast_lcf_bert.state_dict'))

Full Console Output (Required)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
RuntimeError: Error(s) in loading state_dict for DebertaV2Model:
	Missing key(s) in state_dict: "embeddings.word_embeddings.weight", "embeddings.LayerNorm.weight", "embeddings.LayerNorm.bias", "encoder.layer.0.attention.self.query_proj.weight", "encoder.layer.0.attention.self.query_proj.bias", "encoder.layer.0.attention.self.key_proj.weight", "encoder.layer.0.attention.self.key_proj.bias", "encoder.layer.0.attention.self.value_proj.weight", "encoder.layer.0.attention.self.value_proj.bias", "encoder.layer.0.attention.output.dense.weight", "encoder.layer.0.attention.output.dense.bias", "encoder.layer.0.attention.output.LayerNorm.weight", "encoder.layer.0.attention.output.LayerNorm.bias", "encoder.layer.0.intermediate.dense.weight", "encoder.layer.0.intermediate.dense.bias", "encoder.layer.0.output.dense.weight", "encoder.layer.0.output.dense.bias", "encoder.layer.0.output.LayerNorm.weight", "encoder.layer.0.output.LayerNorm.bias", "encoder.layer.1.attention.self.query_proj.weight", "encoder.layer.1.attention.self.query_proj.bias", "encoder.layer.1.attention.self.key_proj.weight", "encoder.layer.1.attention.self.key_proj.bias", "encoder.layer.1.attention.self.value_proj.weight", "encoder.layer.1.attention.self.value_proj.bias", "encoder.layer.1.attention.output.dense.weight", "encoder.layer.1.attention.output.dense.bias", "encoder.layer.1.attention.output.LayerNorm.weight", "encoder.layer.1.attention.output.LayerNorm.bias", "encoder.layer.1.intermediate.dense.weight", "encoder.layer.1.intermediate.dense.bias", "encoder.layer.1.output.dense.weight", "encoder.layer.1.output.dense.bias", "encoder.layer.1.output.LayerNorm.weight", "encoder.layer.1.output.LayerNorm.bias", "encoder.layer.2.attention.self.query_proj.weight", "encoder.layer.2.attention.self.query_proj.bias", "encoder.layer.2.attention.self.key_proj.weight", "encoder.layer.2.attention.self.key_proj.bias", "encoder.layer.2.attention.self.value_proj.weight", "encoder.layer.2.attention.self.value_proj.bias", "encoder.layer.2.attention.output.dense.weight", "encoder.layer.2.attention.output.dense.bias", "encoder.layer.2.attention.output.LayerNorm.weight", "encoder.layer.2.attention.output.LayerNorm.bias", "encoder.layer.2.intermediate.dense.weight", "encoder.layer.2.intermediate.dense.bias", "encoder.layer.2.output.dense.weight", "encoder.layer.2.output.dense.bias", "encoder.layer.2.output.LayerNorm.weight", "encoder.layer.2.output.LayerNorm.bias", "encoder.layer.3.attention.self.query_proj.weight", "encoder.layer.3.attention.self.query_proj.bias", "encoder.layer.3.attention.self.key_proj.weight", "encoder.layer.3.attention.self.key_proj.bias", "encoder.layer.3.attention.self.value_proj.weight", "encoder.layer.3.attention.self.value_proj.bias", "encoder.layer.3.attention.output.dense.weight", "encoder.layer.3.attention.output.dense.bias", "encoder.layer.3.attention.output.LayerNorm.weight", "encoder.layer.3.attention.output.LayerNorm.bias", "encoder.layer.3.intermediate.dense.weight", "encoder.layer.3.intermediate.dense.bias", "encoder.layer.3.output.dense.weight", "encoder.layer.3.output.dense.bias", "encoder.layer.3.output.LayerNorm.weight", "encoder.layer.3.output.LayerNorm.bias", "encoder.layer.4.attention.self.query_proj.weight", "encoder.layer.4.attention.self.query_proj.bias", "encoder.layer.4.attention.self.key_proj.weight", "encoder.layer.4.attention.self.key_proj.bias", "encoder.layer.4.attention.self.value_proj.weight", "encoder.layer.4.attention.self.value_proj.bias", "encoder.layer.4.attention.output.dense.weight", "encoder.layer.4.attention.output.dense.bias", "encoder.layer.4.attention.output.LayerNorm.weight", "encoder.layer.4.attention.output.LayerNorm.bias", "encoder.layer.4.intermediate.dense.weight", "encoder.layer.4.intermediate.dense.bias", "encoder.layer.4.output.dense.weight", "encoder.layer.4.output.dense.bias", "encoder.layer.4.output.LayerNorm.weight", "encoder.layer.4.output.LayerNorm.bias", "encoder.layer.5.attention.self.query_proj.weight", "encoder.layer.5.attention.self.query_proj.bias", "encoder.layer.5.attention.self.key_proj.weight", "encoder.layer.5.attention.self.key_proj.bias", "encoder.layer.5.attention.self.value_proj.weight", "encoder.layer.5.attention.self.value_proj.bias", "encoder.layer.5.attention.output.dense.weight", "encoder.layer.5.attention.output.dense.bias", "encoder.layer.5.attention.output.LayerNorm.weight", "encoder.layer.5.attention.output.LayerNorm.bias", "encoder.layer.5.intermediate.dense.weight", "encoder.layer.5.intermediate.dense.bias", "encoder.layer.5.output.dense.weight", "encoder.layer.5.output.dense.bias", "encoder.layer.5.output.LayerNorm.weight", "encoder.layer.5.output.LayerNorm.bias", "encoder.layer.6.attention.self.query_proj.weight", "encoder.layer.6.attention.self.query_proj.bias", "encoder.layer.6.attention.self.key_proj.weight", "encoder.layer.6.attention.self.key_proj.bias", "encoder.layer.6.attention.self.value_proj.weight", "encoder.layer.6.attention.self.value_proj.bias", "encoder.layer.6.attention.output.dense.weight", "encoder.layer.6.attention.output.dense.bias", "encoder.layer.6.attention.output.LayerNorm.weight", "encoder.layer.6.attention.output.LayerNorm.bias", "encoder.layer.6.intermediate.dense.weight", "encoder.layer.6.intermediate.dense.bias", "encoder.layer.6.output.dense.weight", "encoder.layer.6.output.dense.bias", "encoder.layer.6.output.LayerNorm.weight", "encoder.layer.6.output.LayerNorm.bias", "encoder.layer.7.attention.self.query_proj.weight", "encoder.layer.7.attention.self.query_proj.bias", "encoder.layer.7.attention.self.key_proj.weight", "encoder.layer.7.attention.self.key_proj.bias", "encoder.layer.7.attention.self.value_proj.weight", "encoder.layer.7.attention.self.value_proj.bias", "encoder.layer.7.attention.output.dense.weight", "encoder.layer.7.attention.output.dense.bias", "encoder.layer.7.attention.output.LayerNorm.weight", "encoder.layer.7.attention.output.LayerNorm.bias", "encoder.layer.7.intermediate.dense.weight", "encoder.layer.7.intermediate.dense.bias", "encoder.layer.7.output.dense.weight", "encoder.layer.7.output.dense.bias", "encoder.layer.7.output.LayerNorm.weight", "encoder.layer.7.output.LayerNorm.bias", "encoder.layer.8.attention.self.query_proj.weight", "encoder.layer.8.attention.self.query_proj.bias", "encoder.layer.8.attention.self.key_proj.weight", "encoder.layer.8.attention.self.key_proj.bias", "encoder.layer.8.attention.self.value_proj.weight", "encoder.layer.8.attention.self.value_proj.bias", "encoder.layer.8.attention.output.dense.weight", "encoder.layer.8.attention.output.dense.bias", "encoder.layer.8.attention.output.LayerNorm.weight", "encoder.layer.8.attention.output.LayerNorm.bias", "encoder.layer.8.intermediate.dense.weight", "encoder.layer.8.intermediate.dense.bias", "encoder.layer.8.output.dense.weight", "encoder.layer.8.output.dense.bias", "encoder.layer.8.output.LayerNorm.weight", "encoder.layer.8.output.LayerNorm.bias", "encoder.layer.9.attention.self.query_proj.weight", "encoder.layer.9.attention.self.query_proj.bias", "encoder.layer.9.attention.self.key_proj.weight", "encoder.layer.9.attention.self.key_proj.bias", "encoder.layer.9.attention.self.value_proj.weight", "encoder.layer.9.attention.self.value_proj.bias", "encoder.layer.9.attention.output.dense.weight", "encoder.layer.9.attention.output.dense.bias", "encoder.layer.9.attention.output.LayerNorm.weight", "encoder.layer.9.attention.output.LayerNorm.bias", "encoder.layer.9.intermediate.dense.weight", "encoder.layer.9.intermediate.dense.bias", "encoder.layer.9.output.dense.weight", "encoder.layer.9.output.dense.bias", "encoder.layer.9.output.LayerNorm.weight", "encoder.layer.9.output.LayerNorm.bias", "encoder.layer.10.attention.self.query_proj.weight", "encoder.layer.10.attention.self.query_proj.bias", "encoder.layer.10.attention.self.key_proj.weight", "encoder.layer.10.attention.self.key_proj.bias", "encoder.layer.10.attention.self.value_proj.weight", "encoder.layer.10.attention.self.value_proj.bias", "encoder.layer.10.attention.output.dense.weight", "encoder.layer.10.attention.output.dense.bias", "encoder.layer.10.attention.output.LayerNorm.weight", "encoder.layer.10.attention.output.LayerNorm.bias", "encoder.layer.10.intermediate.dense.weight", "encoder.layer.10.intermediate.dense.bias", "encoder.layer.10.output.dense.weight", "encoder.layer.10.output.dense.bias", "encoder.layer.10.output.LayerNorm.weight", "encoder.layer.10.output.LayerNorm.bias", "encoder.layer.11.attention.self.query_proj.weight", "encoder.layer.11.attention.self.query_proj.bias", "encoder.layer.11.attention.self.key_proj.weight", "encoder.layer.11.attention.self.key_proj.bias", "encoder.layer.11.attention.self.value_proj.weight", "encoder.layer.11.attention.self.value_proj.bias", "encoder.layer.11.attention.output.dense.weight", "encoder.layer.11.attention.output.dense.bias", "encoder.layer.11.attention.output.LayerNorm.weight", "encoder.layer.11.attention.output.LayerNorm.bias", "encoder.layer.11.intermediate.dense.weight", "encoder.layer.11.intermediate.dense.bias", "encoder.layer.11.output.dense.weight", "encoder.layer.11.output.dense.bias", "encoder.layer.11.output.LayerNorm.weight", "encoder.layer.11.output.LayerNorm.bias", "encoder.rel_embeddings.weight", "encoder.LayerNorm.weight", "encoder.LayerNorm.bias". 
	Unexpected key(s) in state_dict: "models.0.bert4global.embeddings.position_ids", "models.0.bert4global.embeddings.word_embeddings.weight", "models.0.bert4global.embeddings.LayerNorm.weight", "models.0.bert4global.embeddings.LayerNorm.bias", "models.0.bert4global.encoder.layer.0.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.0.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.0.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.0.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.0.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.0.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.0.attention.output.dense.weight", "models.0.bert4global.encoder.layer.0.attention.output.dense.bias", "models.0.bert4global.encoder.layer.0.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.0.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.0.intermediate.dense.weight", "models.0.bert4global.encoder.layer.0.intermediate.dense.bias", "models.0.bert4global.encoder.layer.0.output.dense.weight", "models.0.bert4global.encoder.layer.0.output.dense.bias", "models.0.bert4global.encoder.layer.0.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.0.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.1.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.1.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.1.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.1.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.1.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.1.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.1.attention.output.dense.weight", "models.0.bert4global.encoder.layer.1.attention.output.dense.bias", "models.0.bert4global.encoder.layer.1.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.1.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.1.intermediate.dense.weight", "models.0.bert4global.encoder.layer.1.intermediate.dense.bias", "models.0.bert4global.encoder.layer.1.output.dense.weight", "models.0.bert4global.encoder.layer.1.output.dense.bias", "models.0.bert4global.encoder.layer.1.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.1.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.2.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.2.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.2.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.2.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.2.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.2.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.2.attention.output.dense.weight", "models.0.bert4global.encoder.layer.2.attention.output.dense.bias", "models.0.bert4global.encoder.layer.2.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.2.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.2.intermediate.dense.weight", "models.0.bert4global.encoder.layer.2.intermediate.dense.bias", "models.0.bert4global.encoder.layer.2.output.dense.weight", "models.0.bert4global.encoder.layer.2.output.dense.bias", "models.0.bert4global.encoder.layer.2.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.2.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.3.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.3.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.3.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.3.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.3.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.3.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.3.attention.output.dense.weight", "models.0.bert4global.encoder.layer.3.attention.output.dense.bias", "models.0.bert4global.encoder.layer.3.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.3.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.3.intermediate.dense.weight", "models.0.bert4global.encoder.layer.3.intermediate.dense.bias", "models.0.bert4global.encoder.layer.3.output.dense.weight", "models.0.bert4global.encoder.layer.3.output.dense.bias", "models.0.bert4global.encoder.layer.3.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.3.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.4.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.4.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.4.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.4.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.4.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.4.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.4.attention.output.dense.weight", "models.0.bert4global.encoder.layer.4.attention.output.dense.bias", "models.0.bert4global.encoder.layer.4.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.4.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.4.intermediate.dense.weight", "models.0.bert4global.encoder.layer.4.intermediate.dense.bias", "models.0.bert4global.encoder.layer.4.output.dense.weight", "models.0.bert4global.encoder.layer.4.output.dense.bias", "models.0.bert4global.encoder.layer.4.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.4.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.5.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.5.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.5.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.5.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.5.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.5.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.5.attention.output.dense.weight", "models.0.bert4global.encoder.layer.5.attention.output.dense.bias", "models.0.bert4global.encoder.layer.5.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.5.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.5.intermediate.dense.weight", "models.0.bert4global.encoder.layer.5.intermediate.dense.bias", "models.0.bert4global.encoder.layer.5.output.dense.weight", "models.0.bert4global.encoder.layer.5.output.dense.bias", "models.0.bert4global.encoder.layer.5.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.5.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.6.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.6.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.6.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.6.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.6.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.6.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.6.attention.output.dense.weight", "models.0.bert4global.encoder.layer.6.attention.output.dense.bias", "models.0.bert4global.encoder.layer.6.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.6.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.6.intermediate.dense.weight", "models.0.bert4global.encoder.layer.6.intermediate.dense.bias", "models.0.bert4global.encoder.layer.6.output.dense.weight", "models.0.bert4global.encoder.layer.6.output.dense.bias", "models.0.bert4global.encoder.layer.6.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.6.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.7.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.7.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.7.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.7.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.7.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.7.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.7.attention.output.dense.weight", "models.0.bert4global.encoder.layer.7.attention.output.dense.bias", "models.0.bert4global.encoder.layer.7.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.7.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.7.intermediate.dense.weight", "models.0.bert4global.encoder.layer.7.intermediate.dense.bias", "models.0.bert4global.encoder.layer.7.output.dense.weight", "models.0.bert4global.encoder.layer.7.output.dense.bias", "models.0.bert4global.encoder.layer.7.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.7.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.8.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.8.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.8.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.8.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.8.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.8.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.8.attention.output.dense.weight", "models.0.bert4global.encoder.layer.8.attention.output.dense.bias", "models.0.bert4global.encoder.layer.8.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.8.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.8.intermediate.dense.weight", "models.0.bert4global.encoder.layer.8.intermediate.dense.bias", "models.0.bert4global.encoder.layer.8.output.dense.weight", "models.0.bert4global.encoder.layer.8.output.dense.bias", "models.0.bert4global.encoder.layer.8.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.8.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.9.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.9.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.9.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.9.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.9.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.9.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.9.attention.output.dense.weight", "models.0.bert4global.encoder.layer.9.attention.output.dense.bias", "models.0.bert4global.encoder.layer.9.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.9.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.9.intermediate.dense.weight", "models.0.bert4global.encoder.layer.9.intermediate.dense.bias", "models.0.bert4global.encoder.layer.9.output.dense.weight", "models.0.bert4global.encoder.layer.9.output.dense.bias", "models.0.bert4global.encoder.layer.9.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.9.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.10.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.10.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.10.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.10.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.10.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.10.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.10.attention.output.dense.weight", "models.0.bert4global.encoder.layer.10.attention.output.dense.bias", "models.0.bert4global.encoder.layer.10.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.10.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.10.intermediate.dense.weight", "models.0.bert4global.encoder.layer.10.intermediate.dense.bias", "models.0.bert4global.encoder.layer.10.output.dense.weight", "models.0.bert4global.encoder.layer.10.output.dense.bias", "models.0.bert4global.encoder.layer.10.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.10.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.11.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.11.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.11.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.11.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.11.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.11.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.11.attention.output.dense.weight", "models.0.bert4global.encoder.layer.11.attention.output.dense.bias", "models.0.bert4global.encoder.layer.11.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.11.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.11.intermediate.dense.weight", "models.0.bert4global.encoder.layer.11.intermediate.dense.bias", "models.0.bert4global.encoder.layer.11.output.dense.weight", "models.0.bert4global.encoder.layer.11.output.dense.bias", "models.0.bert4global.encoder.layer.11.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.11.output.LayerNorm.bias", "models.0.bert4global.encoder.rel_embeddings.weight", "models.0.bert4global.encoder.LayerNorm.weight", "models.0.bert4global.encoder.LayerNorm.bias", "models.0.bert4local.embeddings.position_ids", "models.0.bert4local.embeddings.word_embeddings.weight", "models.0.bert4local.embeddings.LayerNorm.weight", "models.0.bert4local.embeddings.LayerNorm.bias", "models.0.bert4local.encoder.layer.0.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.0.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.0.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.0.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.0.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.0.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.0.attention.output.dense.weight", "models.0.bert4local.encoder.layer.0.attention.output.dense.bias", "models.0.bert4local.encoder.layer.0.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.0.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.0.intermediate.dense.weight", "models.0.bert4local.encoder.layer.0.intermediate.dense.bias", "models.0.bert4local.encoder.layer.0.output.dense.weight", "models.0.bert4local.encoder.layer.0.output.dense.bias", "models.0.bert4local.encoder.layer.0.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.0.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.1.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.1.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.1.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.1.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.1.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.1.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.1.attention.output.dense.weight", "models.0.bert4local.encoder.layer.1.attention.output.dense.bias", "models.0.bert4local.encoder.layer.1.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.1.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.1.intermediate.dense.weight", "models.0.bert4local.encoder.layer.1.intermediate.dense.bias", "models.0.bert4local.encoder.layer.1.output.dense.weight", "models.0.bert4local.encoder.layer.1.output.dense.bias", "models.0.bert4local.encoder.layer.1.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.1.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.2.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.2.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.2.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.2.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.2.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.2.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.2.attention.output.dense.weight", "models.0.bert4local.encoder.layer.2.attention.output.dense.bias", "models.0.bert4local.encoder.layer.2.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.2.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.2.intermediate.dense.weight", "models.0.bert4local.encoder.layer.2.intermediate.dense.bias", "models.0.bert4local.encoder.layer.2.output.dense.weight", "models.0.bert4local.encoder.layer.2.output.dense.bias", "models.0.bert4local.encoder.layer.2.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.2.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.3.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.3.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.3.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.3.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.3.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.3.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.3.attention.output.dense.weight", "models.0.bert4local.encoder.layer.3.attention.output.dense.bias", "models.0.bert4local.encoder.layer.3.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.3.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.3.intermediate.dense.weight", "models.0.bert4local.encoder.layer.3.intermediate.dense.bias", "models.0.bert4local.encoder.layer.3.output.dense.weight", "models.0.bert4local.encoder.layer.3.output.dense.bias", "models.0.bert4local.encoder.layer.3.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.3.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.4.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.4.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.4.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.4.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.4.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.4.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.4.attention.output.dense.weight", "models.0.bert4local.encoder.layer.4.attention.output.dense.bias", "models.0.bert4local.encoder.layer.4.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.4.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.4.intermediate.dense.weight", "models.0.bert4local.encoder.layer.4.intermediate.dense.bias", "models.0.bert4local.encoder.layer.4.output.dense.weight", "models.0.bert4local.encoder.layer.4.output.dense.bias", "models.0.bert4local.encoder.layer.4.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.4.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.5.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.5.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.5.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.5.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.5.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.5.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.5.attention.output.dense.weight", "models.0.bert4local.encoder.layer.5.attention.output.dense.bias", "models.0.bert4local.encoder.layer.5.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.5.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.5.intermediate.dense.weight", "models.0.bert4local.encoder.layer.5.intermediate.dense.bias", "models.0.bert4local.encoder.layer.5.output.dense.weight", "models.0.bert4local.encoder.layer.5.output.dense.bias", "models.0.bert4local.encoder.layer.5.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.5.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.6.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.6.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.6.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.6.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.6.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.6.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.6.attention.output.dense.weight", "models.0.bert4local.encoder.layer.6.attention.output.dense.bias", "models.0.bert4local.encoder.layer.6.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.6.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.6.intermediate.dense.weight", "models.0.bert4local.encoder.layer.6.intermediate.dense.bias", "models.0.bert4local.encoder.layer.6.output.dense.weight", "models.0.bert4local.encoder.layer.6.output.dense.bias", "models.0.bert4local.encoder.layer.6.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.6.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.7.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.7.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.7.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.7.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.7.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.7.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.7.attention.output.dense.weight", "models.0.bert4local.encoder.layer.7.attention.output.dense.bias", "models.0.bert4local.encoder.layer.7.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.7.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.7.intermediate.dense.weight", "models.0.bert4local.encoder.layer.7.intermediate.dense.bias", "models.0.bert4local.encoder.layer.7.output.dense.weight", "models.0.bert4local.encoder.layer.7.output.dense.bias", "models.0.bert4local.encoder.layer.7.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.7.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.8.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.8.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.8.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.8.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.8.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.8.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.8.attention.output.dense.weight", "models.0.bert4local.encoder.layer.8.attention.output.dense.bias", "models.0.bert4local.encoder.layer.8.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.8.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.8.intermediate.dense.weight", "models.0.bert4local.encoder.layer.8.intermediate.dense.bias", "models.0.bert4local.encoder.layer.8.output.dense.weight", "models.0.bert4local.encoder.layer.8.output.dense.bias", "models.0.bert4local.encoder.layer.8.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.8.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.9.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.9.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.9.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.9.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.9.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.9.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.9.attention.output.dense.weight", "models.0.bert4local.encoder.layer.9.attention.output.dense.bias", "models.0.bert4local.encoder.layer.9.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.9.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.9.intermediate.dense.weight", "models.0.bert4local.encoder.layer.9.intermediate.dense.bias", "models.0.bert4local.encoder.layer.9.output.dense.weight", "models.0.bert4local.encoder.layer.9.output.dense.bias", "models.0.bert4local.encoder.layer.9.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.9.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.10.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.10.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.10.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.10.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.10.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.10.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.10.attention.output.dense.weight", "models.0.bert4local.encoder.layer.10.attention.output.dense.bias", "models.0.bert4local.encoder.layer.10.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.10.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.10.intermediate.dense.weight", "models.0.bert4local.encoder.layer.10.intermediate.dense.bias", "models.0.bert4local.encoder.layer.10.output.dense.weight", "models.0.bert4local.encoder.layer.10.output.dense.bias", "models.0.bert4local.encoder.layer.10.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.10.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.11.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.11.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.11.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.11.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.11.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.11.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.11.attention.output.dense.weight", "models.0.bert4local.encoder.layer.11.attention.output.dense.bias", "models.0.bert4local.encoder.layer.11.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.11.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.11.intermediate.dense.weight", "models.0.bert4local.encoder.layer.11.intermediate.dense.bias", "models.0.bert4local.encoder.layer.11.output.dense.weight", "models.0.bert4local.encoder.layer.11.output.dense.bias", "models.0.bert4local.encoder.layer.11.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.11.output.LayerNorm.bias", "models.0.bert4local.encoder.rel_embeddings.weight", "models.0.bert4local.encoder.LayerNorm.weight", "models.0.bert4local.encoder.LayerNorm.bias", "models.0.bert_SA.encoder.0.SA.query.weight", "models.0.bert_SA.encoder.0.SA.query.bias", "models.0.bert_SA.encoder.0.SA.key.weight", "models.0.bert_SA.encoder.0.SA.key.bias", "models.0.bert_SA.encoder.0.SA.value.weight", "models.0.bert_SA.encoder.0.SA.value.bias", "models.0.linear2.weight", "models.0.linear2.bias", "models.0.bert_SA_.encoder.0.SA.query.weight", "models.0.bert_SA_.encoder.0.SA.query.bias", "models.0.bert_SA_.encoder.0.SA.key.weight", "models.0.bert_SA_.encoder.0.SA.key.bias", "models.0.bert_SA_.encoder.0.SA.value.weight", "models.0.bert_SA_.encoder.0.SA.value.bias", "models.0.bert_pooler.dense.weight", "models.0.bert_pooler.dense.bias", "models.0.dense.weight", "models.0.dense.bias", "bert.embeddings.position_ids", "bert.embeddings.word_embeddings.weight", "bert.embeddings.LayerNorm.weight", "bert.embeddings.LayerNorm.bias", "bert.encoder.layer.0.attention.self.query_proj.weight", "bert.encoder.layer.0.attention.self.query_proj.bias", "bert.encoder.layer.0.attention.self.key_proj.weight", "bert.encoder.layer.0.attention.self.key_proj.bias", "bert.encoder.layer.0.attention.self.value_proj.weight", "bert.encoder.layer.0.attention.self.value_proj.bias", "bert.encoder.layer.0.attention.output.dense.weight", "bert.encoder.layer.0.attention.output.dense.bias", "bert.encoder.layer.0.attention.output.LayerNorm.weight", "bert.encoder.layer.0.attention.output.LayerNorm.bias", "bert.encoder.layer.0.intermediate.dense.weight", "bert.encoder.layer.0.intermediate.dense.bias", "bert.encoder.layer.0.output.dense.weight", "bert.encoder.layer.0.output.dense.bias", "bert.encoder.layer.0.output.LayerNorm.weight", "bert.encoder.layer.0.output.LayerNorm.bias", "bert.encoder.layer.1.attention.self.query_proj.weight", "bert.encoder.layer.1.attention.self.query_proj.bias", "bert.encoder.layer.1.attention.self.key_proj.weight", "bert.encoder.layer.1.attention.self.key_proj.bias", "bert.encoder.layer.1.attention.self.value_proj.weight", "bert.encoder.layer.1.attention.self.value_proj.bias", "bert.encoder.layer.1.attention.output.dense.weight", "bert.encoder.layer.1.attention.output.dense.bias", "bert.encoder.layer.1.attention.output.LayerNorm.weight", "bert.encoder.layer.1.attention.output.LayerNorm.bias", "bert.encoder.layer.1.intermediate.dense.weight", "bert.encoder.layer.1.intermediate.dense.bias", "bert.encoder.layer.1.output.dense.weight", "bert.encoder.layer.1.output.dense.bias", "bert.encoder.layer.1.output.LayerNorm.weight", "bert.encoder.layer.1.output.LayerNorm.bias", "bert.encoder.layer.2.attention.self.query_proj.weight", "bert.encoder.layer.2.attention.self.query_proj.bias", "bert.encoder.layer.2.attention.self.key_proj.weight", "bert.encoder.layer.2.attention.self.key_proj.bias", "bert.encoder.layer.2.attention.self.value_proj.weight", "bert.encoder.layer.2.attention.self.value_proj.bias", "bert.encoder.layer.2.attention.output.dense.weight", "bert.encoder.layer.2.attention.output.dense.bias", "bert.encoder.layer.2.attention.output.LayerNorm.weight", "bert.encoder.layer.2.attention.output.LayerNorm.bias", "bert.encoder.layer.2.intermediate.dense.weight", "bert.encoder.layer.2.intermediate.dense.bias", "bert.encoder.layer.2.output.dense.weight", "bert.encoder.layer.2.output.dense.bias", "bert.encoder.layer.2.output.LayerNorm.weight", "bert.encoder.layer.2.output.LayerNorm.bias", "bert.encoder.layer.3.attention.self.query_proj.weight", "bert.encoder.layer.3.attention.self.query_proj.bias", "bert.encoder.layer.3.attention.self.key_proj.weight", "bert.encoder.layer.3.attention.self.key_proj.bias", "bert.encoder.layer.3.attention.self.value_proj.weight", "bert.encoder.layer.3.attention.self.value_proj.bias", "bert.encoder.layer.3.attention.output.dense.weight", "bert.encoder.layer.3.attention.output.dense.bias", "bert.encoder.layer.3.attention.output.LayerNorm.weight", "bert.encoder.layer.3.attention.output.LayerNorm.bias", "bert.encoder.layer.3.intermediate.dense.weight", "bert.encoder.layer.3.intermediate.dense.bias", "bert.encoder.layer.3.output.dense.weight", "bert.encoder.layer.3.output.dense.bias", "bert.encoder.layer.3.output.LayerNorm.weight", "bert.encoder.layer.3.output.LayerNorm.bias", "bert.encoder.layer.4.attention.self.query_proj.weight", "bert.encoder.layer.4.attention.self.query_proj.bias", "bert.encoder.layer.4.attention.self.key_proj.weight", "bert.encoder.layer.4.attention.self.key_proj.bias", "bert.encoder.layer.4.attention.self.value_proj.weight", "bert.encoder.layer.4.attention.self.value_proj.bias", "bert.encoder.layer.4.attention.output.dense.weight", "bert.encoder.layer.4.attention.output.dense.bias", "bert.encoder.layer.4.attention.output.LayerNorm.weight", "bert.encoder.layer.4.attention.output.LayerNorm.bias", "bert.encoder.layer.4.intermediate.dense.weight", "bert.encoder.layer.4.intermediate.dense.bias", "bert.encoder.layer.4.output.dense.weight", "bert.encoder.layer.4.output.dense.bias", "bert.encoder.layer.4.output.LayerNorm.weight", "bert.encoder.layer.4.output.LayerNorm.bias", "bert.encoder.layer.5.attention.self.query_proj.weight", "bert.encoder.layer.5.attention.self.query_proj.bias", "bert.encoder.layer.5.attention.self.key_proj.weight", "bert.encoder.layer.5.attention.self.key_proj.bias", "bert.encoder.layer.5.attention.self.value_proj.weight", "bert.encoder.layer.5.attention.self.value_proj.bias", "bert.encoder.layer.5.attention.output.dense.weight", "bert.encoder.layer.5.attention.output.dense.bias", "bert.encoder.layer.5.attention.output.LayerNorm.weight", "bert.encoder.layer.5.attention.output.LayerNorm.bias", "bert.encoder.layer.5.intermediate.dense.weight", "bert.encoder.layer.5.intermediate.dense.bias", "bert.encoder.layer.5.output.dense.weight", "bert.encoder.layer.5.output.dense.bias", "bert.encoder.layer.5.output.LayerNorm.weight", "bert.encoder.layer.5.output.LayerNorm.bias", "bert.encoder.layer.6.attention.self.query_proj.weight", "bert.encoder.layer.6.attention.self.query_proj.bias", "bert.encoder.layer.6.attention.self.key_proj.weight", "bert.encoder.layer.6.attention.self.key_proj.bias", "bert.encoder.layer.6.attention.self.value_proj.weight", "bert.encoder.layer.6.attention.self.value_proj.bias", "bert.encoder.layer.6.attention.output.dense.weight", "bert.encoder.layer.6.attention.output.dense.bias", "bert.encoder.layer.6.attention.output.LayerNorm.weight", "bert.encoder.layer.6.attention.output.LayerNorm.bias", "bert.encoder.layer.6.intermediate.dense.weight", "bert.encoder.layer.6.intermediate.dense.bias", "bert.encoder.layer.6.output.dense.weight", "bert.encoder.layer.6.output.dense.bias", "bert.encoder.layer.6.output.LayerNorm.weight", "bert.encoder.layer.6.output.LayerNorm.bias", "bert.encoder.layer.7.attention.self.query_proj.weight", "bert.encoder.layer.7.attention.self.query_proj.bias", "bert.encoder.layer.7.attention.self.key_proj.weight", "bert.encoder.layer.7.attention.self.key_proj.bias", "bert.encoder.layer.7.attention.self.value_proj.weight", "bert.encoder.layer.7.attention.self.value_proj.bias", "bert.encoder.layer.7.attention.output.dense.weight", "bert.encoder.layer.7.attention.output.dense.bias", "bert.encoder.layer.7.attention.output.LayerNorm.weight", "bert.encoder.layer.7.attention.output.LayerNorm.bias", "bert.encoder.layer.7.intermediate.dense.weight", "bert.encoder.layer.7.intermediate.dense.bias", "bert.encoder.layer.7.output.dense.weight", "bert.encoder.layer.7.output.dense.bias", "bert.encoder.layer.7.output.LayerNorm.weight", "bert.encoder.layer.7.output.LayerNorm.bias", "bert.encoder.layer.8.attention.self.query_proj.weight", "bert.encoder.layer.8.attention.self.query_proj.bias", "bert.encoder.layer.8.attention.self.key_proj.weight", "bert.encoder.layer.8.attention.self.key_proj.bias", "bert.encoder.layer.8.attention.self.value_proj.weight", "bert.encoder.layer.8.attention.self.value_proj.bias", "bert.encoder.layer.8.attention.output.dense.weight", "bert.encoder.layer.8.attention.output.dense.bias", "bert.encoder.layer.8.attention.output.LayerNorm.weight", "bert.encoder.layer.8.attention.output.LayerNorm.bias", "bert.encoder.layer.8.intermediate.dense.weight", "bert.encoder.layer.8.intermediate.dense.bias", "bert.encoder.layer.8.output.dense.weight", "bert.encoder.layer.8.output.dense.bias", "bert.encoder.layer.8.output.LayerNorm.weight", "bert.encoder.layer.8.output.LayerNorm.bias", "bert.encoder.layer.9.attention.self.query_proj.weight", "bert.encoder.layer.9.attention.self.query_proj.bias", "bert.encoder.layer.9.attention.self.key_proj.weight", "bert.encoder.layer.9.attention.self.key_proj.bias", "bert.encoder.layer.9.attention.self.value_proj.weight", "bert.encoder.layer.9.attention.self.value_proj.bias", "bert.encoder.layer.9.attention.output.dense.weight", "bert.encoder.layer.9.attention.output.dense.bias", "bert.encoder.layer.9.attention.output.LayerNorm.weight", "bert.encoder.layer.9.attention.output.LayerNorm.bias", "bert.encoder.layer.9.intermediate.dense.weight", "bert.encoder.layer.9.intermediate.dense.bias", "bert.encoder.layer.9.output.dense.weight", "bert.encoder.layer.9.output.dense.bias", "bert.encoder.layer.9.output.LayerNorm.weight", "bert.encoder.layer.9.output.LayerNorm.bias", "bert.encoder.layer.10.attention.self.query_proj.weight", "bert.encoder.layer.10.attention.self.query_proj.bias", "bert.encoder.layer.10.attention.self.key_proj.weight", "bert.encoder.layer.10.attention.self.key_proj.bias", "bert.encoder.layer.10.attention.self.value_proj.weight", "bert.encoder.layer.10.attention.self.value_proj.bias", "bert.encoder.layer.10.attention.output.dense.weight", "bert.encoder.layer.10.attention.output.dense.bias", "bert.encoder.layer.10.attention.output.LayerNorm.weight", "bert.encoder.layer.10.attention.output.LayerNorm.bias", "bert.encoder.layer.10.intermediate.dense.weight", "bert.encoder.layer.10.intermediate.dense.bias", "bert.encoder.layer.10.output.dense.weight", "bert.encoder.layer.10.output.dense.bias", "bert.encoder.layer.10.output.LayerNorm.weight", "bert.encoder.layer.10.output.LayerNorm.bias", "bert.encoder.layer.11.attention.self.query_proj.weight", "bert.encoder.layer.11.attention.self.query_proj.bias", "bert.encoder.layer.11.attention.self.key_proj.weight", "bert.encoder.layer.11.attention.self.key_proj.bias", "bert.encoder.layer.11.attention.self.value_proj.weight", "bert.encoder.layer.11.attention.self.value_proj.bias", "bert.encoder.layer.11.attention.output.dense.weight", "bert.encoder.layer.11.attention.output.dense.bias", "bert.encoder.layer.11.attention.output.LayerNorm.weight", "bert.encoder.layer.11.attention.output.LayerNorm.bias", "bert.encoder.layer.11.intermediate.dense.weight", "bert.encoder.layer.11.intermediate.dense.bias", "bert.encoder.layer.11.output.dense.weight", "bert.encoder.layer.11.output.dense.bias", "bert.encoder.layer.11.output.LayerNorm.weight", "bert.encoder.layer.11.output.LayerNorm.bias", "bert.encoder.rel_embeddings.weight", "bert.encoder.LayerNorm.weight", "bert.encoder.LayerNorm.bias", "dense.weight", "dense.bias".

Heng Yang · Answer 1 · Wed Nov 29 2023 04:15:40 GMT+0800 (China Standard Time)

This is not expected usage. yangheng/deberta-v3-base-absa-v1.1 is a stand-alone model (universal architecture from transformers) for aspect sentiment classification, and this state_dict is based on the architecture of pyabsa APC models.

José David Lopes · Answer 2 · Wed Nov 29 2023 16:25:12 GMT+0800 (China Standard Time)

I'm getting the exact same error when running the code provided in the tutorial. Therefore I was digging into the code to identify the problem and included the section were I was getting the error.

config = APC.APCConfigManager.get_apc_config_english()

dataset = APC.APCDatasetList.Laptop14

config.num_epoch = 1
config.model = APC.APCModelList.FAST_LSA_T_V2
trainer = APC.APCTrainer(
    config=config,
    dataset=dataset,
    from_checkpoint="english",
    # if you want to resume training from our pretrained checkpoints, you can pass the checkpoint name here
    auto_device=DeviceTypeOption.AUTO,
    path_to_save=None,  # set a path to save checkpoints, if it is None, save checkpoints at 'checkpoints' folder
    checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT,
    load_aug=False,
    # there are some augmentation dataset for integrated datasets, you use them by setting load_aug=True to improve performance
)

Heng Yang · Answer 3 · Wed Nov 29 2023 22:11:29 GMT+0800 (China Standard Time)

Can you try the 2.3.4rc0 version, this error is triggered because the torch uses the strict mode to load the checkpoint, while the transformers from 4.3x versions refactored the code to remove the position_ids.

José David Lopes · Answer 4 · Wed Nov 29 2023 22:57:59 GMT+0800 (China Standard Time)

Working now! Thanks for fixing this! :)