Error when loading pretrained model for finetunning from a checkpoint of the pretrained model

Question

Error when loading pretrained model for finetunning from a checkpoint of the pretrained model

danarte opened this issue 2 years ago · comments

Hi,
Very simple issue, this error:
"ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group"
Is displayed when I'm trying to load a a pre-trained model for finetunning while doing so from a checkpoint folder located in the pretrained model's folder. When loading the model from the "root" folder of the pre-trained model (which contains the checkpoints) the finetunning is performed fine.
The error is thrown before the start of the training.

To reproduce, simply follow the steps in the example in README.md including the pretraining (just set to lower number of epochs) and then for the finetunning at step 3.3 set the path of the model to one of the checkpoint folders like:
If for the pretraining the output folder was set to:
export OUTPUT_PATH=output$KMER
Then for finetunning set:
export MODEL_PATH=output$KMER/checkpoint-1800/

Dominic Lopez · Answer 1 · Sat Nov 18 2023 03:26:40 GMT+0800 (China Standard Time)

Hello,

I'm having this same issue. I think there's a problem fine-tuning DNABERT from a checkpoint rather than training from a completed loop. Have you found an issue to this? What do you mean by loading the model from the "root" folder of the pre-trained model? Are you referring to the given sample pre-trained models?

Update:

Delete optimizer.pt, scheduler.pt, and training_args.bin from the checkpoint to fine-tune from checkpoint