Experiment Results' Reproduce using provided Checkpoint
GMago-LeWay opened this issue · comments
Hello!
I downloaded the trained checkpoint in README for inferring on the test set to reproduce the results.
The results given in README are (EM / F0.5 : 34.10 / 45.48). But my results (utilizing the run_stg_joint.sh) are (EM / F0.5 : 50.5 / 37.9). This difference cannot be neglected.
Actually, I adjusted some code while inferring.
- In line 16 of FCGEC/model/STG-correction/Model/tagger_model.py, I have to change the self.max_token = args.max_generate + 1 to self.max_token = args.max_generate. Otherwise, the parameter shape of self._hidden2t in the checkpoint cannot match the constructed model.
- In line 46 of FCGEC-main/model/STG-correction/preprocess_data.py. Some additional code needs to be added because the "uid" for every sentence is essential in the test process. Thus, an additional column of the key is added in test.csv and I copy it to stg_joint_test.xlsx. I used this excel to form the final submission. My results are in row GMago on the Codalab page of results.
Hi, thank for your feedback! The responses to your question are as follows:
- There is a mistake here, we actually set max_generate (released checkpoint) to 5 during training phase. However, we calculated the distribution of the data after the rebuttal and thought that 6 would be more appropriate. Thus we recommend to re-run with max_generate =6. Thank you for pointing out the problem, I will update it in the README afterwards.
- This result looks a bit weird, I will download your submission on Colab and check it to find the problem. It may take some time, I will reply here after i check it.
Hello, I have identified the reason for performance difference in our codalab system. We are very sorry for the error of our scoring program. More details are shown below:
For correction metric calculation, we only compute the metric on erroneous sentences. Therefore, we need to filter out the correct sentences first (based on the error_flag
attribute in golden label file). While developing the scoring program, I mistakenly employ the error_flag
of the prediction file
instead of the golden label file
. Thus, resulting in an error for two metrics (corr_ex and corr_f0.5).
We have fixed the bug and you can submit the previous predict.zip
file for re-testing, the results will be:
Meanwhile, i have updated the py and bash file to add the uid
into the output file.
Thank you for your feedback!!! If you cannot reproduce our performances in codalab, feel free to add the comments here.
I made a submission and now the results of the given checkpoint are consistent with README (EM / F0.5 : 34.10 / 45.48).
Thanks for your reply!
Hello!
I downloaded checkpoint and pretrained_model, and modified it to my path, but I still get an error:
"joint_evaluate.py: error: argument --lm_path: expected one argument"
how to solve it.
Thanks!
Hello! I downloaded checkpoint and pretrained_model, and modified it to my path, but I still get an error: "joint_evaluate.py: error: argument --lm_path: expected one argument" how to solve it. Thanks!
Hi, it seems you have used multiple values for lm_path
, which stands for the path to the pre-trained language model. Can you share the complete bash script or the command?
#!/bin/bash
Copyright 2022 The ZJU MMF Authors (Lvxiaowei Xu, Jianwang Wu, Jiawei Peng, Jiayu Fu and Ming Cai *).
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Train and Test for STG-Joint
Global Variable (!!! SHOULD ADAPT TO YOUR CONFIGURATION !!!)
CUDA_ID=1
SEED=2022
EPOCH=50
BATCH_SIZE=32
MAX_GENERATE=5 # MAX T
SPECIAL_MAPPING=false # More details can be found in ISSUE 10
CHECKPOINT_DIR=checkpoints
Roberta-base-chinese can be downloaded at https://github.com/ymcui/Chinese-BERT-wwm
#PLM_PATH=/datadisk2/xlxw/Resources/pretrained_models/roberta-base-chinese/ # pretrained-model path
PLM_PATH= /pretrained_models/chinese-roberta-wwm-ext/
OUTPUT_PATH=stg_joint_test.xlsx
JOINT_CHECK_DIR=1021_jointmodel_stg
STEP 1 - PREPROCESS DATASET
#DATA_BASE_DIR=dataset
#DATA_OUT_DIR=stg_joint
#DATA_TRAIN_FILE=FCGEC_train.json
#DATA_VALID_FILE=FCGEC_valid.json
#DATA_TEST_FILE=FCGEC_test.json
#python preprocess_data.py --mode normal --err_only True
#--data_dir ${DATA_BASE_DIR} --out_dir ${DATA_OUT_DIR}
#--train_file ${DATA_TRAIN_FILE} --valid_file ${DATA_VALID_FILE} --test_file ${DATA_TEST_FILE}
STEP 2 - TRAIN STG-Joint MODEL
#python joint_stg.py --mode train
#--gpu_id ${CUDA_ID}
#--seed ${SEED}
#--checkpoints ${CHECKPOINT_DIR}
#--checkp ${JOINT_CHECK_DIR}
#--data_base_dir ${DATA_BASE_DIR}/${DATA_OUT_DIR}
#--lm_path ${PLM_PATH}
#--batch_size ${BATCH_SIZE}
#--epoch ${EPOCH}
#--max_generate ${MAX_GENERATE}
STEP 3 - TRAIN STG-Joint MODEL
python joint_evaluate.py --mode test --gpu_id ${CUDA_ID} --seed ${SEED}
--checkpoints ${CHECKPOINT_DIR} --checkp ${JOINT_CHECK_DIR}
--export ${OUTPUT_PATH}
--data_base_dir ${DATA_BASE_DIR}/${DATA_OUT_DIR}
--max_generate ${MAX_GENERATE}
--lm_path ${PLM_PATH}
--batch_size ${BATCH_SIZE}
--sp_map ${SPECIAL_MAPPING}
run: sh run_stg_joint.sh
#!/bin/bash
Copyright 2022 The ZJU MMF Authors (Lvxiaowei Xu, Jianwang Wu, Jiawei Peng, Jiayu Fu and Ming Cai *).
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Train and Test for STG-Joint
Global Variable (!!! SHOULD ADAPT TO YOUR CONFIGURATION !!!)
CUDA_ID=1 SEED=2022 EPOCH=50 BATCH_SIZE=32 MAX_GENERATE=5 # MAX T SPECIAL_MAPPING=false # More details can be found in ISSUE 10 CHECKPOINT_DIR=checkpoints
Roberta-base-chinese can be downloaded at https://github.com/ymcui/Chinese-BERT-wwm
#PLM_PATH=/datadisk2/xlxw/Resources/pretrained_models/roberta-base-chinese/ # pretrained-model path PLM_PATH= /pretrained_models/chinese-roberta-wwm-ext/ OUTPUT_PATH=stg_joint_test.xlsx
JOINT_CHECK_DIR=1021_jointmodel_stg
STEP 1 - PREPROCESS DATASET
#DATA_BASE_DIR=dataset #DATA_OUT_DIR=stg_joint #DATA_TRAIN_FILE=FCGEC_train.json #DATA_VALID_FILE=FCGEC_valid.json #DATA_TEST_FILE=FCGEC_test.json
#python preprocess_data.py --mode normal --err_only True #--data_dir ${DATA_BASE_DIR} --out_dir ${DATA_OUT_DIR} #--train_file ${DATA_TRAIN_FILE} --valid_file ${DATA_VALID_FILE} --test_file ${DATA_TEST_FILE}
STEP 2 - TRAIN STG-Joint MODEL
#python joint_stg.py --mode train #--gpu_id ${CUDA_ID} #--seed ${SEED} #--checkpoints ${CHECKPOINT_DIR} #--checkp ${JOINT_CHECK_DIR} #--data_base_dir ${DATA_BASE_DIR}/${DATA_OUT_DIR} #--lm_path ${PLM_PATH} #--batch_size ${BATCH_SIZE} #--epoch ${EPOCH} #--max_generate ${MAX_GENERATE}
STEP 3 - TRAIN STG-Joint MODEL
python joint_evaluate.py --mode test --gpu_id ${CUDA_ID} --seed ${SEED} --checkpoints ${CHECKPOINT_DIR} --checkp ${JOINT_CHECK_DIR} --export ${OUTPUT_PATH} --data_base_dir ${DATA_BASE_DIR}/${DATA_OUT_DIR} --max_generate ${MAX_GENERATE} --lm_path ${PLM_PATH} --batch_size ${BATCH_SIZE} --sp_map ${SPECIAL_MAPPING}
run: sh run_stg_joint.sh
The error I can currently find is that you have commented out two parameters, DATA_BASE_DIR
and DATA_OUT_DIR
. This will cause joint_evaluate.py cannot run properly, but the configuration for lm_path
seems to be correct. Have you considered taking out the line of joint_evaluate.py
in the bash script and testing it in command line mode?
Sorry!
In evaluate_joint_config.py, I forgot to modify the parameters # Pretrained Model Params
pretrained_args = ArgumentGroup(parser, 'pretrained', 'Pretrained Model Settings')
pretrained_args.add_arg('use_lm', bool, True, 'Whether Model Use Language Models')
############################
# pretrained_args.add_arg('lm_path', str, '/datadisk2/xlxw/Resources/pretrained_models/roberta-base-chinese', 'Bert Pretrained Model Path')
pretrained_args.add_arg('lm_path', str, './pretrained_models/chinese-roberta-wwm-ext/', 'Bert Pretrained Model Path')
############################
pretrained_args.add_arg('lm_hidden_size', int, 768, 'HiddenSize of PLM')
pretrained_args.add_arg('output_hidden_states', bool, True, 'Output PLM Hidden States')
pretrained_args. add_arg('finetune', bool, True, 'Finetune Or Freeze')
But I encountered a new problem, Some weights of the model checkpoint at ./pretrained_models/chinese-roberta-wwm-ext/ were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform. LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform. dense.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
RuntimeError: Error(s) in loading state_dict for JointModel:
size mismatch for tagger._hidden2t.linear.weight: copying a param with shape torch.Size([6, 768]) from checkpoint, the shape in current model is torch.Size([7, 768]).
size mismatch for tagger._hidden2t.linear.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([7]).
The pre-trained language model uses https://huggingface.co/hfl/chinese-roberta-wwm-ext/
Sorry!
In evaluate_joint_config.py, I forgot to modify the parameters # Pretrained Model Params pretrained_args = ArgumentGroup(parser, 'pretrained', 'Pretrained Model Settings') pretrained_args.add_arg('use_lm', bool, True, 'Whether Model Use Language Models')
############################ # pretrained_args.add_arg('lm_path', str, '/datadisk2/xlxw/Resources/pretrained_models/roberta-base-chinese', 'Bert Pretrained Model Path') pretrained_args.add_arg('lm_path', str, './pretrained_models/chinese-roberta-wwm-ext/', 'Bert Pretrained Model Path') ############################ pretrained_args.add_arg('lm_hidden_size', int, 768, 'HiddenSize of PLM') pretrained_args.add_arg('output_hidden_states', bool, True, 'Output PLM Hidden States') pretrained_args. add_arg('finetune', bool, True, 'Finetune Or Freeze')
But I encountered a new problem, Some weights of the model checkpoint at ./pretrained_models/chinese-roberta-wwm-ext/ were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform. LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform. dense.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
RuntimeError: Error(s) in loading state_dict for JointModel: size mismatch for tagger._hidden2t.linear.weight: copying a param with shape torch.Size([6, 768]) from checkpoint, the shape in current model is torch.Size([7, 768]). size mismatch for tagger._hidden2t.linear.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([7]).
The pre-trained language model uses https://huggingface.co/hfl/chinese-roberta-wwm-ext/
Thank you, I solved it. In the py file, set the Number of Tagger Classes to 5, and the Number of Max Token Generation to 5. Thank you very much
Sorry!
In evaluate_joint_config.py, I forgot to modify the parameters # Pretrained Model Params pretrained_args = ArgumentGroup(parser, 'pretrained', 'Pretrained Model Settings') pretrained_args.add_arg('use_lm', bool, True, 'Whether Model Use Language Models')############################ # pretrained_args.add_arg('lm_path', str, '/datadisk2/xlxw/Resources/pretrained_models/roberta-base-chinese', 'Bert Pretrained Model Path') pretrained_args.add_arg('lm_path', str, './pretrained_models/chinese-roberta-wwm-ext/', 'Bert Pretrained Model Path') ############################ pretrained_args.add_arg('lm_hidden_size', int, 768, 'HiddenSize of PLM') pretrained_args.add_arg('output_hidden_states', bool, True, 'Output PLM Hidden States') pretrained_args. add_arg('finetune', bool, True, 'Finetune Or Freeze')
But I encountered a new problem, Some weights of the model checkpoint at ./pretrained_models/chinese-roberta-wwm-ext/ were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform. LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform. dense.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
RuntimeError: Error(s) in loading state_dict for JointModel: size mismatch for tagger._hidden2t.linear.weight: copying a param with shape torch.Size([6, 768]) from checkpoint, the shape in current model is torch.Size([7, 768]). size mismatch for tagger._hidden2t.linear.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([7]).
The pre-trained language model uses https://huggingface.co/hfl/chinese-roberta-wwm-ext/Thank you, I solved it. In the py file, set the Number of Tagger Classes to 5, and the Number of Max Token Generation to 5. Thank you very much
You're welcome.