which pre-train model should we use for fine-tuning

Question

which pre-train model should we use for fine-tuning

Aniruddha-JU opened this issue 2 years ago · comments

I have pre-trained the IndicBART model on new monolingual data, and in the model path two models are saved 1) IndicBART and 2) IndicBART_puremodel. Now which should we use during the fine-tuning?

Aniruddha-JU · Answer 1 · Thu Aug 25 2022 16:30:53 GMT+0800 (China Standard Time)

IndicBART size is 2.4 GB and pure_model size is 932.

Raj Dabre · Answer 2 · Thu Aug 25 2022 17:14:41 GMT+0800 (China Standard Time)

Either.

Use the pure model with the flag --pretrained_model

Use the larger model with the flag --pretrained_model and an additional flag --no_reload_optimizer_ctr_and_scheduler

The larger checkpoint contains optimizer and scheduler states so you can resume pretraining in case of crash. During fine tuning resetting the optimizer is more common.