Questions about parameters.
Chunngai opened this issue · comments
Hi. Thanks for your great work and for your kindly releasing the source code. But I have some questions.
i) What's the parameters used (common & specific)? In "training_parameters.md" you only mentioned params for XLNet and RoBERTA.
ii) I should set special_tokens_fix
to 0 for the BERT model, right? You have not mentioned that in the "training_parameters.md" file (even for the XLNet model).
Hi @Chunngai
In training_parameters.md
for each stage, we show parameters for BERT, and for RoBERTa and XLNet, we show model-specific params. For BERT, spetial_tokens_fix
is 0.
By the way, we have links for downloading trained models' weight in the https://github.com/grammarly/gector#pretrained-models section.
@skurzhanskyi
So to train a BERT GEC model the params in each stage should be:
- stage 1:
- 20 epochs + 3 patience (as in "Number of epochs and early stopping" in "training_parameters.md")
- Params of "Same parameters for all stages" in "training_parameters.md"
- Params of "Stage1 parameters" in "training_parameters.md"
special_tokens_fix == 0
- stage 2:
- 20 epochs + 3 patience
- Params of "Same parameters for all stages"
- Params of "Stage2 parameters"
special_tokens_fix == 0
- stage 3:
- 20 epochs + 3 patience
- Params of "Same parameters for all stages"
- Params of "Stage3 parameters"
special_tokens_fix == 0
For prediction the params in each stage should be:
- Params of "For prediction during stage1-3 we used" in "training_parameters.md"
special_tokens_fix == 0
And "Same parameters for all stages" is NOT for prediction in each stage.
Is that right?
We had fewer epochs for stage 2 and stage 3, similar to RoBERTa and XLNet.
You're right; we're missing some information for BERT.
@skurzhanskyi Ok thanks.
So can I set n_epoch == 20
in the 2nd and 3rd stage, as patience == 3
may stop the training somewhere? Or could you provide epoch info for BERT?
I think yes
Best parameters for final predictions you can find at https://github.com/grammarly/gector#pretrained-models
Ok thanks. I'll try it.