Problem loading checkpoint for AspectPolarityClassification
zedavid opened this issue · comments
Please provide the REQUIRED information. Otherwise, It is almost impossible to locate the problem. DO NOT CHANGE THE FORM.
PyABSA Version (Required)
Python Version: 3.11.15
PyABSA Version: 2.1.6
Torch Version: 2.1.1+cu118
Transformers Version: 4.35.2
Describe the bug
I'm trying to run the recipe to perform aspect level classification following the steps of the documentation tutorial. When loading the model from the trained checkpoint the state_dict file does not contain the same keys the model architecture is expecting.
Code To Reproduce (Required)
from transformers import AutoModel
import torch
pyabsa_model = AutoModel.from_pretrained('yangheng/deberta-v3-base-absa-v1.1')
pyabsa_model.load_state_dict(state_dict=torch.load('fast_lcf_bert.state_dict'))
Full Console Output (Required)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
RuntimeError: Error(s) in loading state_dict for DebertaV2Model:
Missing key(s) in state_dict: "embeddings.word_embeddings.weight", "embeddings.LayerNorm.weight", "embeddings.LayerNorm.bias", "encoder.layer.0.attention.self.query_proj.weight", "encoder.layer.0.attention.self.query_proj.bias", "encoder.layer.0.attention.self.key_proj.weight", "encoder.layer.0.attention.self.key_proj.bias", "encoder.layer.0.attention.self.value_proj.weight", "encoder.layer.0.attention.self.value_proj.bias", "encoder.layer.0.attention.output.dense.weight", "encoder.layer.0.attention.output.dense.bias", "encoder.layer.0.attention.output.LayerNorm.weight", "encoder.layer.0.attention.output.LayerNorm.bias", "encoder.layer.0.intermediate.dense.weight", "encoder.layer.0.intermediate.dense.bias", "encoder.layer.0.output.dense.weight", "encoder.layer.0.output.dense.bias", "encoder.layer.0.output.LayerNorm.weight", "encoder.layer.0.output.LayerNorm.bias", "encoder.layer.1.attention.self.query_proj.weight", "encoder.layer.1.attention.self.query_proj.bias", "encoder.layer.1.attention.self.key_proj.weight", "encoder.layer.1.attention.self.key_proj.bias", "encoder.layer.1.attention.self.value_proj.weight", "encoder.layer.1.attention.self.value_proj.bias", "encoder.layer.1.attention.output.dense.weight", "encoder.layer.1.attention.output.dense.bias", "encoder.layer.1.attention.output.LayerNorm.weight", "encoder.layer.1.attention.output.LayerNorm.bias", "encoder.layer.1.intermediate.dense.weight", "encoder.layer.1.intermediate.dense.bias", "encoder.layer.1.output.dense.weight", "encoder.layer.1.output.dense.bias", "encoder.layer.1.output.LayerNorm.weight", "encoder.layer.1.output.LayerNorm.bias", "encoder.layer.2.attention.self.query_proj.weight", "encoder.layer.2.attention.self.query_proj.bias", "encoder.layer.2.attention.self.key_proj.weight", "encoder.layer.2.attention.self.key_proj.bias", "encoder.layer.2.attention.self.value_proj.weight", "encoder.layer.2.attention.self.value_proj.bias", "encoder.layer.2.attention.output.dense.weight", "encoder.layer.2.attention.output.dense.bias", "encoder.layer.2.attention.output.LayerNorm.weight", "encoder.layer.2.attention.output.LayerNorm.bias", "encoder.layer.2.intermediate.dense.weight", "encoder.layer.2.intermediate.dense.bias", "encoder.layer.2.output.dense.weight", "encoder.layer.2.output.dense.bias", "encoder.layer.2.output.LayerNorm.weight", "encoder.layer.2.output.LayerNorm.bias", "encoder.layer.3.attention.self.query_proj.weight", "encoder.layer.3.attention.self.query_proj.bias", "encoder.layer.3.attention.self.key_proj.weight", "encoder.layer.3.attention.self.key_proj.bias", "encoder.layer.3.attention.self.value_proj.weight", "encoder.layer.3.attention.self.value_proj.bias", "encoder.layer.3.attention.output.dense.weight", "encoder.layer.3.attention.output.dense.bias", "encoder.layer.3.attention.output.LayerNorm.weight", "encoder.layer.3.attention.output.LayerNorm.bias", "encoder.layer.3.intermediate.dense.weight", "encoder.layer.3.intermediate.dense.bias", "encoder.layer.3.output.dense.weight", "encoder.layer.3.output.dense.bias", "encoder.layer.3.output.LayerNorm.weight", "encoder.layer.3.output.LayerNorm.bias", "encoder.layer.4.attention.self.query_proj.weight", "encoder.layer.4.attention.self.query_proj.bias", "encoder.layer.4.attention.self.key_proj.weight", "encoder.layer.4.attention.self.key_proj.bias", "encoder.layer.4.attention.self.value_proj.weight", "encoder.layer.4.attention.self.value_proj.bias", "encoder.layer.4.attention.output.dense.weight", "encoder.layer.4.attention.output.dense.bias", "encoder.layer.4.attention.output.LayerNorm.weight", "encoder.layer.4.attention.output.LayerNorm.bias", "encoder.layer.4.intermediate.dense.weight", "encoder.layer.4.intermediate.dense.bias", "encoder.layer.4.output.dense.weight", "encoder.layer.4.output.dense.bias", "encoder.layer.4.output.LayerNorm.weight", "encoder.layer.4.output.LayerNorm.bias", "encoder.layer.5.attention.self.query_proj.weight", "encoder.layer.5.attention.self.query_proj.bias", "encoder.layer.5.attention.self.key_proj.weight", "encoder.layer.5.attention.self.key_proj.bias", "encoder.layer.5.attention.self.value_proj.weight", "encoder.layer.5.attention.self.value_proj.bias", "encoder.layer.5.attention.output.dense.weight", "encoder.layer.5.attention.output.dense.bias", "encoder.layer.5.attention.output.LayerNorm.weight", "encoder.layer.5.attention.output.LayerNorm.bias", "encoder.layer.5.intermediate.dense.weight", "encoder.layer.5.intermediate.dense.bias", "encoder.layer.5.output.dense.weight", "encoder.layer.5.output.dense.bias", "encoder.layer.5.output.LayerNorm.weight", "encoder.layer.5.output.LayerNorm.bias", "encoder.layer.6.attention.self.query_proj.weight", "encoder.layer.6.attention.self.query_proj.bias", "encoder.layer.6.attention.self.key_proj.weight", "encoder.layer.6.attention.self.key_proj.bias", "encoder.layer.6.attention.self.value_proj.weight", "encoder.layer.6.attention.self.value_proj.bias", "encoder.layer.6.attention.output.dense.weight", "encoder.layer.6.attention.output.dense.bias", "encoder.layer.6.attention.output.LayerNorm.weight", "encoder.layer.6.attention.output.LayerNorm.bias", "encoder.layer.6.intermediate.dense.weight", "encoder.layer.6.intermediate.dense.bias", "encoder.layer.6.output.dense.weight", "encoder.layer.6.output.dense.bias", "encoder.layer.6.output.LayerNorm.weight", "encoder.layer.6.output.LayerNorm.bias", "encoder.layer.7.attention.self.query_proj.weight", "encoder.layer.7.attention.self.query_proj.bias", "encoder.layer.7.attention.self.key_proj.weight", "encoder.layer.7.attention.self.key_proj.bias", "encoder.layer.7.attention.self.value_proj.weight", "encoder.layer.7.attention.self.value_proj.bias", "encoder.layer.7.attention.output.dense.weight", "encoder.layer.7.attention.output.dense.bias", "encoder.layer.7.attention.output.LayerNorm.weight", "encoder.layer.7.attention.output.LayerNorm.bias", "encoder.layer.7.intermediate.dense.weight", "encoder.layer.7.intermediate.dense.bias", "encoder.layer.7.output.dense.weight", "encoder.layer.7.output.dense.bias", "encoder.layer.7.output.LayerNorm.weight", "encoder.layer.7.output.LayerNorm.bias", "encoder.layer.8.attention.self.query_proj.weight", "encoder.layer.8.attention.self.query_proj.bias", "encoder.layer.8.attention.self.key_proj.weight", "encoder.layer.8.attention.self.key_proj.bias", "encoder.layer.8.attention.self.value_proj.weight", "encoder.layer.8.attention.self.value_proj.bias", "encoder.layer.8.attention.output.dense.weight", "encoder.layer.8.attention.output.dense.bias", "encoder.layer.8.attention.output.LayerNorm.weight", "encoder.layer.8.attention.output.LayerNorm.bias", "encoder.layer.8.intermediate.dense.weight", "encoder.layer.8.intermediate.dense.bias", "encoder.layer.8.output.dense.weight", "encoder.layer.8.output.dense.bias", "encoder.layer.8.output.LayerNorm.weight", "encoder.layer.8.output.LayerNorm.bias", "encoder.layer.9.attention.self.query_proj.weight", "encoder.layer.9.attention.self.query_proj.bias", "encoder.layer.9.attention.self.key_proj.weight", "encoder.layer.9.attention.self.key_proj.bias", "encoder.layer.9.attention.self.value_proj.weight", "encoder.layer.9.attention.self.value_proj.bias", "encoder.layer.9.attention.output.dense.weight", "encoder.layer.9.attention.output.dense.bias", "encoder.layer.9.attention.output.LayerNorm.weight", "encoder.layer.9.attention.output.LayerNorm.bias", "encoder.layer.9.intermediate.dense.weight", "encoder.layer.9.intermediate.dense.bias", "encoder.layer.9.output.dense.weight", "encoder.layer.9.output.dense.bias", "encoder.layer.9.output.LayerNorm.weight", "encoder.layer.9.output.LayerNorm.bias", "encoder.layer.10.attention.self.query_proj.weight", "encoder.layer.10.attention.self.query_proj.bias", "encoder.layer.10.attention.self.key_proj.weight", "encoder.layer.10.attention.self.key_proj.bias", "encoder.layer.10.attention.self.value_proj.weight", "encoder.layer.10.attention.self.value_proj.bias", "encoder.layer.10.attention.output.dense.weight", "encoder.layer.10.attention.output.dense.bias", "encoder.layer.10.attention.output.LayerNorm.weight", "encoder.layer.10.attention.output.LayerNorm.bias", "encoder.layer.10.intermediate.dense.weight", "encoder.layer.10.intermediate.dense.bias", "encoder.layer.10.output.dense.weight", "encoder.layer.10.output.dense.bias", "encoder.layer.10.output.LayerNorm.weight", "encoder.layer.10.output.LayerNorm.bias", "encoder.layer.11.attention.self.query_proj.weight", "encoder.layer.11.attention.self.query_proj.bias", "encoder.layer.11.attention.self.key_proj.weight", "encoder.layer.11.attention.self.key_proj.bias", "encoder.layer.11.attention.self.value_proj.weight", "encoder.layer.11.attention.self.value_proj.bias", "encoder.layer.11.attention.output.dense.weight", "encoder.layer.11.attention.output.dense.bias", "encoder.layer.11.attention.output.LayerNorm.weight", "encoder.layer.11.attention.output.LayerNorm.bias", "encoder.layer.11.intermediate.dense.weight", "encoder.layer.11.intermediate.dense.bias", "encoder.layer.11.output.dense.weight", "encoder.layer.11.output.dense.bias", "encoder.layer.11.output.LayerNorm.weight", "encoder.layer.11.output.LayerNorm.bias", "encoder.rel_embeddings.weight", "encoder.LayerNorm.weight", "encoder.LayerNorm.bias".
Unexpected key(s) in state_dict: "models.0.bert4global.embeddings.position_ids", "models.0.bert4global.embeddings.word_embeddings.weight", "models.0.bert4global.embeddings.LayerNorm.weight", "models.0.bert4global.embeddings.LayerNorm.bias", "models.0.bert4global.encoder.layer.0.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.0.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.0.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.0.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.0.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.0.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.0.attention.output.dense.weight", "models.0.bert4global.encoder.layer.0.attention.output.dense.bias", "models.0.bert4global.encoder.layer.0.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.0.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.0.intermediate.dense.weight", "models.0.bert4global.encoder.layer.0.intermediate.dense.bias", "models.0.bert4global.encoder.layer.0.output.dense.weight", "models.0.bert4global.encoder.layer.0.output.dense.bias", "models.0.bert4global.encoder.layer.0.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.0.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.1.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.1.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.1.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.1.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.1.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.1.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.1.attention.output.dense.weight", "models.0.bert4global.encoder.layer.1.attention.output.dense.bias", "models.0.bert4global.encoder.layer.1.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.1.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.1.intermediate.dense.weight", "models.0.bert4global.encoder.layer.1.intermediate.dense.bias", "models.0.bert4global.encoder.layer.1.output.dense.weight", "models.0.bert4global.encoder.layer.1.output.dense.bias", "models.0.bert4global.encoder.layer.1.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.1.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.2.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.2.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.2.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.2.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.2.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.2.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.2.attention.output.dense.weight", "models.0.bert4global.encoder.layer.2.attention.output.dense.bias", "models.0.bert4global.encoder.layer.2.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.2.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.2.intermediate.dense.weight", "models.0.bert4global.encoder.layer.2.intermediate.dense.bias", "models.0.bert4global.encoder.layer.2.output.dense.weight", "models.0.bert4global.encoder.layer.2.output.dense.bias", "models.0.bert4global.encoder.layer.2.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.2.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.3.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.3.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.3.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.3.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.3.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.3.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.3.attention.output.dense.weight", "models.0.bert4global.encoder.layer.3.attention.output.dense.bias", "models.0.bert4global.encoder.layer.3.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.3.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.3.intermediate.dense.weight", "models.0.bert4global.encoder.layer.3.intermediate.dense.bias", "models.0.bert4global.encoder.layer.3.output.dense.weight", "models.0.bert4global.encoder.layer.3.output.dense.bias", "models.0.bert4global.encoder.layer.3.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.3.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.4.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.4.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.4.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.4.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.4.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.4.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.4.attention.output.dense.weight", "models.0.bert4global.encoder.layer.4.attention.output.dense.bias", "models.0.bert4global.encoder.layer.4.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.4.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.4.intermediate.dense.weight", "models.0.bert4global.encoder.layer.4.intermediate.dense.bias", "models.0.bert4global.encoder.layer.4.output.dense.weight", "models.0.bert4global.encoder.layer.4.output.dense.bias", "models.0.bert4global.encoder.layer.4.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.4.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.5.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.5.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.5.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.5.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.5.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.5.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.5.attention.output.dense.weight", "models.0.bert4global.encoder.layer.5.attention.output.dense.bias", "models.0.bert4global.encoder.layer.5.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.5.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.5.intermediate.dense.weight", "models.0.bert4global.encoder.layer.5.intermediate.dense.bias", "models.0.bert4global.encoder.layer.5.output.dense.weight", "models.0.bert4global.encoder.layer.5.output.dense.bias", "models.0.bert4global.encoder.layer.5.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.5.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.6.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.6.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.6.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.6.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.6.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.6.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.6.attention.output.dense.weight", "models.0.bert4global.encoder.layer.6.attention.output.dense.bias", "models.0.bert4global.encoder.layer.6.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.6.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.6.intermediate.dense.weight", "models.0.bert4global.encoder.layer.6.intermediate.dense.bias", "models.0.bert4global.encoder.layer.6.output.dense.weight", "models.0.bert4global.encoder.layer.6.output.dense.bias", "models.0.bert4global.encoder.layer.6.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.6.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.7.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.7.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.7.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.7.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.7.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.7.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.7.attention.output.dense.weight", "models.0.bert4global.encoder.layer.7.attention.output.dense.bias", "models.0.bert4global.encoder.layer.7.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.7.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.7.intermediate.dense.weight", "models.0.bert4global.encoder.layer.7.intermediate.dense.bias", "models.0.bert4global.encoder.layer.7.output.dense.weight", "models.0.bert4global.encoder.layer.7.output.dense.bias", "models.0.bert4global.encoder.layer.7.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.7.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.8.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.8.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.8.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.8.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.8.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.8.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.8.attention.output.dense.weight", "models.0.bert4global.encoder.layer.8.attention.output.dense.bias", "models.0.bert4global.encoder.layer.8.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.8.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.8.intermediate.dense.weight", "models.0.bert4global.encoder.layer.8.intermediate.dense.bias", "models.0.bert4global.encoder.layer.8.output.dense.weight", "models.0.bert4global.encoder.layer.8.output.dense.bias", "models.0.bert4global.encoder.layer.8.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.8.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.9.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.9.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.9.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.9.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.9.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.9.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.9.attention.output.dense.weight", "models.0.bert4global.encoder.layer.9.attention.output.dense.bias", "models.0.bert4global.encoder.layer.9.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.9.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.9.intermediate.dense.weight", "models.0.bert4global.encoder.layer.9.intermediate.dense.bias", "models.0.bert4global.encoder.layer.9.output.dense.weight", "models.0.bert4global.encoder.layer.9.output.dense.bias", "models.0.bert4global.encoder.layer.9.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.9.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.10.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.10.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.10.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.10.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.10.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.10.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.10.attention.output.dense.weight", "models.0.bert4global.encoder.layer.10.attention.output.dense.bias", "models.0.bert4global.encoder.layer.10.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.10.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.10.intermediate.dense.weight", "models.0.bert4global.encoder.layer.10.intermediate.dense.bias", "models.0.bert4global.encoder.layer.10.output.dense.weight", "models.0.bert4global.encoder.layer.10.output.dense.bias", "models.0.bert4global.encoder.layer.10.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.10.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.11.attention.self.query_proj.weight", "models.0.bert4global.encoder.layer.11.attention.self.query_proj.bias", "models.0.bert4global.encoder.layer.11.attention.self.key_proj.weight", "models.0.bert4global.encoder.layer.11.attention.self.key_proj.bias", "models.0.bert4global.encoder.layer.11.attention.self.value_proj.weight", "models.0.bert4global.encoder.layer.11.attention.self.value_proj.bias", "models.0.bert4global.encoder.layer.11.attention.output.dense.weight", "models.0.bert4global.encoder.layer.11.attention.output.dense.bias", "models.0.bert4global.encoder.layer.11.attention.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.11.attention.output.LayerNorm.bias", "models.0.bert4global.encoder.layer.11.intermediate.dense.weight", "models.0.bert4global.encoder.layer.11.intermediate.dense.bias", "models.0.bert4global.encoder.layer.11.output.dense.weight", "models.0.bert4global.encoder.layer.11.output.dense.bias", "models.0.bert4global.encoder.layer.11.output.LayerNorm.weight", "models.0.bert4global.encoder.layer.11.output.LayerNorm.bias", "models.0.bert4global.encoder.rel_embeddings.weight", "models.0.bert4global.encoder.LayerNorm.weight", "models.0.bert4global.encoder.LayerNorm.bias", "models.0.bert4local.embeddings.position_ids", "models.0.bert4local.embeddings.word_embeddings.weight", "models.0.bert4local.embeddings.LayerNorm.weight", "models.0.bert4local.embeddings.LayerNorm.bias", "models.0.bert4local.encoder.layer.0.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.0.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.0.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.0.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.0.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.0.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.0.attention.output.dense.weight", "models.0.bert4local.encoder.layer.0.attention.output.dense.bias", "models.0.bert4local.encoder.layer.0.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.0.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.0.intermediate.dense.weight", "models.0.bert4local.encoder.layer.0.intermediate.dense.bias", "models.0.bert4local.encoder.layer.0.output.dense.weight", "models.0.bert4local.encoder.layer.0.output.dense.bias", "models.0.bert4local.encoder.layer.0.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.0.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.1.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.1.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.1.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.1.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.1.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.1.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.1.attention.output.dense.weight", "models.0.bert4local.encoder.layer.1.attention.output.dense.bias", "models.0.bert4local.encoder.layer.1.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.1.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.1.intermediate.dense.weight", "models.0.bert4local.encoder.layer.1.intermediate.dense.bias", "models.0.bert4local.encoder.layer.1.output.dense.weight", "models.0.bert4local.encoder.layer.1.output.dense.bias", "models.0.bert4local.encoder.layer.1.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.1.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.2.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.2.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.2.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.2.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.2.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.2.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.2.attention.output.dense.weight", "models.0.bert4local.encoder.layer.2.attention.output.dense.bias", "models.0.bert4local.encoder.layer.2.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.2.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.2.intermediate.dense.weight", "models.0.bert4local.encoder.layer.2.intermediate.dense.bias", "models.0.bert4local.encoder.layer.2.output.dense.weight", "models.0.bert4local.encoder.layer.2.output.dense.bias", "models.0.bert4local.encoder.layer.2.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.2.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.3.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.3.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.3.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.3.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.3.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.3.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.3.attention.output.dense.weight", "models.0.bert4local.encoder.layer.3.attention.output.dense.bias", "models.0.bert4local.encoder.layer.3.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.3.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.3.intermediate.dense.weight", "models.0.bert4local.encoder.layer.3.intermediate.dense.bias", "models.0.bert4local.encoder.layer.3.output.dense.weight", "models.0.bert4local.encoder.layer.3.output.dense.bias", "models.0.bert4local.encoder.layer.3.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.3.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.4.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.4.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.4.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.4.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.4.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.4.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.4.attention.output.dense.weight", "models.0.bert4local.encoder.layer.4.attention.output.dense.bias", "models.0.bert4local.encoder.layer.4.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.4.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.4.intermediate.dense.weight", "models.0.bert4local.encoder.layer.4.intermediate.dense.bias", "models.0.bert4local.encoder.layer.4.output.dense.weight", "models.0.bert4local.encoder.layer.4.output.dense.bias", "models.0.bert4local.encoder.layer.4.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.4.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.5.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.5.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.5.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.5.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.5.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.5.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.5.attention.output.dense.weight", "models.0.bert4local.encoder.layer.5.attention.output.dense.bias", "models.0.bert4local.encoder.layer.5.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.5.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.5.intermediate.dense.weight", "models.0.bert4local.encoder.layer.5.intermediate.dense.bias", "models.0.bert4local.encoder.layer.5.output.dense.weight", "models.0.bert4local.encoder.layer.5.output.dense.bias", "models.0.bert4local.encoder.layer.5.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.5.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.6.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.6.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.6.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.6.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.6.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.6.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.6.attention.output.dense.weight", "models.0.bert4local.encoder.layer.6.attention.output.dense.bias", "models.0.bert4local.encoder.layer.6.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.6.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.6.intermediate.dense.weight", "models.0.bert4local.encoder.layer.6.intermediate.dense.bias", "models.0.bert4local.encoder.layer.6.output.dense.weight", "models.0.bert4local.encoder.layer.6.output.dense.bias", "models.0.bert4local.encoder.layer.6.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.6.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.7.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.7.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.7.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.7.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.7.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.7.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.7.attention.output.dense.weight", "models.0.bert4local.encoder.layer.7.attention.output.dense.bias", "models.0.bert4local.encoder.layer.7.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.7.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.7.intermediate.dense.weight", "models.0.bert4local.encoder.layer.7.intermediate.dense.bias", "models.0.bert4local.encoder.layer.7.output.dense.weight", "models.0.bert4local.encoder.layer.7.output.dense.bias", "models.0.bert4local.encoder.layer.7.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.7.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.8.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.8.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.8.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.8.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.8.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.8.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.8.attention.output.dense.weight", "models.0.bert4local.encoder.layer.8.attention.output.dense.bias", "models.0.bert4local.encoder.layer.8.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.8.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.8.intermediate.dense.weight", "models.0.bert4local.encoder.layer.8.intermediate.dense.bias", "models.0.bert4local.encoder.layer.8.output.dense.weight", "models.0.bert4local.encoder.layer.8.output.dense.bias", "models.0.bert4local.encoder.layer.8.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.8.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.9.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.9.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.9.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.9.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.9.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.9.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.9.attention.output.dense.weight", "models.0.bert4local.encoder.layer.9.attention.output.dense.bias", "models.0.bert4local.encoder.layer.9.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.9.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.9.intermediate.dense.weight", "models.0.bert4local.encoder.layer.9.intermediate.dense.bias", "models.0.bert4local.encoder.layer.9.output.dense.weight", "models.0.bert4local.encoder.layer.9.output.dense.bias", "models.0.bert4local.encoder.layer.9.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.9.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.10.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.10.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.10.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.10.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.10.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.10.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.10.attention.output.dense.weight", "models.0.bert4local.encoder.layer.10.attention.output.dense.bias", "models.0.bert4local.encoder.layer.10.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.10.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.10.intermediate.dense.weight", "models.0.bert4local.encoder.layer.10.intermediate.dense.bias", "models.0.bert4local.encoder.layer.10.output.dense.weight", "models.0.bert4local.encoder.layer.10.output.dense.bias", "models.0.bert4local.encoder.layer.10.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.10.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.11.attention.self.query_proj.weight", "models.0.bert4local.encoder.layer.11.attention.self.query_proj.bias", "models.0.bert4local.encoder.layer.11.attention.self.key_proj.weight", "models.0.bert4local.encoder.layer.11.attention.self.key_proj.bias", "models.0.bert4local.encoder.layer.11.attention.self.value_proj.weight", "models.0.bert4local.encoder.layer.11.attention.self.value_proj.bias", "models.0.bert4local.encoder.layer.11.attention.output.dense.weight", "models.0.bert4local.encoder.layer.11.attention.output.dense.bias", "models.0.bert4local.encoder.layer.11.attention.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.11.attention.output.LayerNorm.bias", "models.0.bert4local.encoder.layer.11.intermediate.dense.weight", "models.0.bert4local.encoder.layer.11.intermediate.dense.bias", "models.0.bert4local.encoder.layer.11.output.dense.weight", "models.0.bert4local.encoder.layer.11.output.dense.bias", "models.0.bert4local.encoder.layer.11.output.LayerNorm.weight", "models.0.bert4local.encoder.layer.11.output.LayerNorm.bias", "models.0.bert4local.encoder.rel_embeddings.weight", "models.0.bert4local.encoder.LayerNorm.weight", "models.0.bert4local.encoder.LayerNorm.bias", "models.0.bert_SA.encoder.0.SA.query.weight", "models.0.bert_SA.encoder.0.SA.query.bias", "models.0.bert_SA.encoder.0.SA.key.weight", "models.0.bert_SA.encoder.0.SA.key.bias", "models.0.bert_SA.encoder.0.SA.value.weight", "models.0.bert_SA.encoder.0.SA.value.bias", "models.0.linear2.weight", "models.0.linear2.bias", "models.0.bert_SA_.encoder.0.SA.query.weight", "models.0.bert_SA_.encoder.0.SA.query.bias", "models.0.bert_SA_.encoder.0.SA.key.weight", "models.0.bert_SA_.encoder.0.SA.key.bias", "models.0.bert_SA_.encoder.0.SA.value.weight", "models.0.bert_SA_.encoder.0.SA.value.bias", "models.0.bert_pooler.dense.weight", "models.0.bert_pooler.dense.bias", "models.0.dense.weight", "models.0.dense.bias", "bert.embeddings.position_ids", "bert.embeddings.word_embeddings.weight", "bert.embeddings.LayerNorm.weight", "bert.embeddings.LayerNorm.bias", "bert.encoder.layer.0.attention.self.query_proj.weight", "bert.encoder.layer.0.attention.self.query_proj.bias", "bert.encoder.layer.0.attention.self.key_proj.weight", "bert.encoder.layer.0.attention.self.key_proj.bias", "bert.encoder.layer.0.attention.self.value_proj.weight", "bert.encoder.layer.0.attention.self.value_proj.bias", "bert.encoder.layer.0.attention.output.dense.weight", "bert.encoder.layer.0.attention.output.dense.bias", "bert.encoder.layer.0.attention.output.LayerNorm.weight", "bert.encoder.layer.0.attention.output.LayerNorm.bias", "bert.encoder.layer.0.intermediate.dense.weight", "bert.encoder.layer.0.intermediate.dense.bias", "bert.encoder.layer.0.output.dense.weight", "bert.encoder.layer.0.output.dense.bias", "bert.encoder.layer.0.output.LayerNorm.weight", "bert.encoder.layer.0.output.LayerNorm.bias", "bert.encoder.layer.1.attention.self.query_proj.weight", "bert.encoder.layer.1.attention.self.query_proj.bias", "bert.encoder.layer.1.attention.self.key_proj.weight", "bert.encoder.layer.1.attention.self.key_proj.bias", "bert.encoder.layer.1.attention.self.value_proj.weight", "bert.encoder.layer.1.attention.self.value_proj.bias", "bert.encoder.layer.1.attention.output.dense.weight", "bert.encoder.layer.1.attention.output.dense.bias", "bert.encoder.layer.1.attention.output.LayerNorm.weight", "bert.encoder.layer.1.attention.output.LayerNorm.bias", "bert.encoder.layer.1.intermediate.dense.weight", "bert.encoder.layer.1.intermediate.dense.bias", "bert.encoder.layer.1.output.dense.weight", "bert.encoder.layer.1.output.dense.bias", "bert.encoder.layer.1.output.LayerNorm.weight", "bert.encoder.layer.1.output.LayerNorm.bias", "bert.encoder.layer.2.attention.self.query_proj.weight", "bert.encoder.layer.2.attention.self.query_proj.bias", "bert.encoder.layer.2.attention.self.key_proj.weight", "bert.encoder.layer.2.attention.self.key_proj.bias", "bert.encoder.layer.2.attention.self.value_proj.weight", "bert.encoder.layer.2.attention.self.value_proj.bias", "bert.encoder.layer.2.attention.output.dense.weight", "bert.encoder.layer.2.attention.output.dense.bias", "bert.encoder.layer.2.attention.output.LayerNorm.weight", "bert.encoder.layer.2.attention.output.LayerNorm.bias", "bert.encoder.layer.2.intermediate.dense.weight", "bert.encoder.layer.2.intermediate.dense.bias", "bert.encoder.layer.2.output.dense.weight", "bert.encoder.layer.2.output.dense.bias", "bert.encoder.layer.2.output.LayerNorm.weight", "bert.encoder.layer.2.output.LayerNorm.bias", "bert.encoder.layer.3.attention.self.query_proj.weight", "bert.encoder.layer.3.attention.self.query_proj.bias", "bert.encoder.layer.3.attention.self.key_proj.weight", "bert.encoder.layer.3.attention.self.key_proj.bias", "bert.encoder.layer.3.attention.self.value_proj.weight", "bert.encoder.layer.3.attention.self.value_proj.bias", "bert.encoder.layer.3.attention.output.dense.weight", "bert.encoder.layer.3.attention.output.dense.bias", "bert.encoder.layer.3.attention.output.LayerNorm.weight", "bert.encoder.layer.3.attention.output.LayerNorm.bias", "bert.encoder.layer.3.intermediate.dense.weight", "bert.encoder.layer.3.intermediate.dense.bias", "bert.encoder.layer.3.output.dense.weight", "bert.encoder.layer.3.output.dense.bias", "bert.encoder.layer.3.output.LayerNorm.weight", "bert.encoder.layer.3.output.LayerNorm.bias", "bert.encoder.layer.4.attention.self.query_proj.weight", "bert.encoder.layer.4.attention.self.query_proj.bias", "bert.encoder.layer.4.attention.self.key_proj.weight", "bert.encoder.layer.4.attention.self.key_proj.bias", "bert.encoder.layer.4.attention.self.value_proj.weight", "bert.encoder.layer.4.attention.self.value_proj.bias", "bert.encoder.layer.4.attention.output.dense.weight", "bert.encoder.layer.4.attention.output.dense.bias", "bert.encoder.layer.4.attention.output.LayerNorm.weight", "bert.encoder.layer.4.attention.output.LayerNorm.bias", "bert.encoder.layer.4.intermediate.dense.weight", "bert.encoder.layer.4.intermediate.dense.bias", "bert.encoder.layer.4.output.dense.weight", "bert.encoder.layer.4.output.dense.bias", "bert.encoder.layer.4.output.LayerNorm.weight", "bert.encoder.layer.4.output.LayerNorm.bias", "bert.encoder.layer.5.attention.self.query_proj.weight", "bert.encoder.layer.5.attention.self.query_proj.bias", "bert.encoder.layer.5.attention.self.key_proj.weight", "bert.encoder.layer.5.attention.self.key_proj.bias", "bert.encoder.layer.5.attention.self.value_proj.weight", "bert.encoder.layer.5.attention.self.value_proj.bias", "bert.encoder.layer.5.attention.output.dense.weight", "bert.encoder.layer.5.attention.output.dense.bias", "bert.encoder.layer.5.attention.output.LayerNorm.weight", "bert.encoder.layer.5.attention.output.LayerNorm.bias", "bert.encoder.layer.5.intermediate.dense.weight", "bert.encoder.layer.5.intermediate.dense.bias", "bert.encoder.layer.5.output.dense.weight", "bert.encoder.layer.5.output.dense.bias", "bert.encoder.layer.5.output.LayerNorm.weight", "bert.encoder.layer.5.output.LayerNorm.bias", "bert.encoder.layer.6.attention.self.query_proj.weight", "bert.encoder.layer.6.attention.self.query_proj.bias", "bert.encoder.layer.6.attention.self.key_proj.weight", "bert.encoder.layer.6.attention.self.key_proj.bias", "bert.encoder.layer.6.attention.self.value_proj.weight", "bert.encoder.layer.6.attention.self.value_proj.bias", "bert.encoder.layer.6.attention.output.dense.weight", "bert.encoder.layer.6.attention.output.dense.bias", "bert.encoder.layer.6.attention.output.LayerNorm.weight", "bert.encoder.layer.6.attention.output.LayerNorm.bias", "bert.encoder.layer.6.intermediate.dense.weight", "bert.encoder.layer.6.intermediate.dense.bias", "bert.encoder.layer.6.output.dense.weight", "bert.encoder.layer.6.output.dense.bias", "bert.encoder.layer.6.output.LayerNorm.weight", "bert.encoder.layer.6.output.LayerNorm.bias", "bert.encoder.layer.7.attention.self.query_proj.weight", "bert.encoder.layer.7.attention.self.query_proj.bias", "bert.encoder.layer.7.attention.self.key_proj.weight", "bert.encoder.layer.7.attention.self.key_proj.bias", "bert.encoder.layer.7.attention.self.value_proj.weight", "bert.encoder.layer.7.attention.self.value_proj.bias", "bert.encoder.layer.7.attention.output.dense.weight", "bert.encoder.layer.7.attention.output.dense.bias", "bert.encoder.layer.7.attention.output.LayerNorm.weight", "bert.encoder.layer.7.attention.output.LayerNorm.bias", "bert.encoder.layer.7.intermediate.dense.weight", "bert.encoder.layer.7.intermediate.dense.bias", "bert.encoder.layer.7.output.dense.weight", "bert.encoder.layer.7.output.dense.bias", "bert.encoder.layer.7.output.LayerNorm.weight", "bert.encoder.layer.7.output.LayerNorm.bias", "bert.encoder.layer.8.attention.self.query_proj.weight", "bert.encoder.layer.8.attention.self.query_proj.bias", "bert.encoder.layer.8.attention.self.key_proj.weight", "bert.encoder.layer.8.attention.self.key_proj.bias", "bert.encoder.layer.8.attention.self.value_proj.weight", "bert.encoder.layer.8.attention.self.value_proj.bias", "bert.encoder.layer.8.attention.output.dense.weight", "bert.encoder.layer.8.attention.output.dense.bias", "bert.encoder.layer.8.attention.output.LayerNorm.weight", "bert.encoder.layer.8.attention.output.LayerNorm.bias", "bert.encoder.layer.8.intermediate.dense.weight", "bert.encoder.layer.8.intermediate.dense.bias", "bert.encoder.layer.8.output.dense.weight", "bert.encoder.layer.8.output.dense.bias", "bert.encoder.layer.8.output.LayerNorm.weight", "bert.encoder.layer.8.output.LayerNorm.bias", "bert.encoder.layer.9.attention.self.query_proj.weight", "bert.encoder.layer.9.attention.self.query_proj.bias", "bert.encoder.layer.9.attention.self.key_proj.weight", "bert.encoder.layer.9.attention.self.key_proj.bias", "bert.encoder.layer.9.attention.self.value_proj.weight", "bert.encoder.layer.9.attention.self.value_proj.bias", "bert.encoder.layer.9.attention.output.dense.weight", "bert.encoder.layer.9.attention.output.dense.bias", "bert.encoder.layer.9.attention.output.LayerNorm.weight", "bert.encoder.layer.9.attention.output.LayerNorm.bias", "bert.encoder.layer.9.intermediate.dense.weight", "bert.encoder.layer.9.intermediate.dense.bias", "bert.encoder.layer.9.output.dense.weight", "bert.encoder.layer.9.output.dense.bias", "bert.encoder.layer.9.output.LayerNorm.weight", "bert.encoder.layer.9.output.LayerNorm.bias", "bert.encoder.layer.10.attention.self.query_proj.weight", "bert.encoder.layer.10.attention.self.query_proj.bias", "bert.encoder.layer.10.attention.self.key_proj.weight", "bert.encoder.layer.10.attention.self.key_proj.bias", "bert.encoder.layer.10.attention.self.value_proj.weight", "bert.encoder.layer.10.attention.self.value_proj.bias", "bert.encoder.layer.10.attention.output.dense.weight", "bert.encoder.layer.10.attention.output.dense.bias", "bert.encoder.layer.10.attention.output.LayerNorm.weight", "bert.encoder.layer.10.attention.output.LayerNorm.bias", "bert.encoder.layer.10.intermediate.dense.weight", "bert.encoder.layer.10.intermediate.dense.bias", "bert.encoder.layer.10.output.dense.weight", "bert.encoder.layer.10.output.dense.bias", "bert.encoder.layer.10.output.LayerNorm.weight", "bert.encoder.layer.10.output.LayerNorm.bias", "bert.encoder.layer.11.attention.self.query_proj.weight", "bert.encoder.layer.11.attention.self.query_proj.bias", "bert.encoder.layer.11.attention.self.key_proj.weight", "bert.encoder.layer.11.attention.self.key_proj.bias", "bert.encoder.layer.11.attention.self.value_proj.weight", "bert.encoder.layer.11.attention.self.value_proj.bias", "bert.encoder.layer.11.attention.output.dense.weight", "bert.encoder.layer.11.attention.output.dense.bias", "bert.encoder.layer.11.attention.output.LayerNorm.weight", "bert.encoder.layer.11.attention.output.LayerNorm.bias", "bert.encoder.layer.11.intermediate.dense.weight", "bert.encoder.layer.11.intermediate.dense.bias", "bert.encoder.layer.11.output.dense.weight", "bert.encoder.layer.11.output.dense.bias", "bert.encoder.layer.11.output.LayerNorm.weight", "bert.encoder.layer.11.output.LayerNorm.bias", "bert.encoder.rel_embeddings.weight", "bert.encoder.LayerNorm.weight", "bert.encoder.LayerNorm.bias", "dense.weight", "dense.bias".
This is not expected usage. yangheng/deberta-v3-base-absa-v1.1 is a stand-alone model (universal architecture from transformers) for aspect sentiment classification, and this state_dict is based on the architecture of pyabsa APC models.
I'm getting the exact same error when running the code provided in the tutorial. Therefore I was digging into the code to identify the problem and included the section were I was getting the error.
config = APC.APCConfigManager.get_apc_config_english()
dataset = APC.APCDatasetList.Laptop14
config.num_epoch = 1
config.model = APC.APCModelList.FAST_LSA_T_V2
trainer = APC.APCTrainer(
config=config,
dataset=dataset,
from_checkpoint="english",
# if you want to resume training from our pretrained checkpoints, you can pass the checkpoint name here
auto_device=DeviceTypeOption.AUTO,
path_to_save=None, # set a path to save checkpoints, if it is None, save checkpoints at 'checkpoints' folder
checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT,
load_aug=False,
# there are some augmentation dataset for integrated datasets, you use them by setting load_aug=True to improve performance
)
Can you try the 2.3.4rc0 version, this error is triggered because the torch uses the strict mode to load the checkpoint, while the transformers from 4.3x versions refactored the code to remove the position_ids.
Working now! Thanks for fixing this! :)