prajdabre / yanmtt

Yet Another Neural Machine Translation Toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

which pre-train model should we use for fine-tuning

Aniruddha-JU opened this issue · comments

I have pre-trained the IndicBART model on new monolingual data, and in the model path two models are saved 1) IndicBART and 2) IndicBART_puremodel. Now which should we use during the fine-tuning?

IndicBART size is 2.4 GB and pure_model size is 932.

Either.

Use the pure model with the flag --pretrained_model

Use the larger model with the flag --pretrained_model and an additional flag --no_reload_optimizer_ctr_and_scheduler

The larger checkpoint contains optimizer and scheduler states so you can resume pretraining in case of crash. During fine tuning resetting the optimizer is more common.