Questions about parameters.

Question

Questions about parameters.

Chunngai opened this issue 3 years ago · comments

Hi. Thanks for your great work and for your kindly releasing the source code. But I have some questions.
i) What's the parameters used (common & specific)? In "training_parameters.md" you only mentioned params for XLNet and RoBERTA.
ii) I should set special_tokens_fix to 0 for the BERT model, right? You have not mentioned that in the "training_parameters.md" file (even for the XLNet model).

Alex Skurzhanskyi · Answer 1 · Mon Sep 20 2021 22:08:14 GMT+0800 (China Standard Time)

Hi @Chunngai
In training_parameters.md for each stage, we show parameters for BERT, and for RoBERTa and XLNet, we show model-specific params. For BERT, spetial_tokens_fix is 0.

Alex Skurzhanskyi · Answer 2 · Mon Sep 20 2021 22:12:16 GMT+0800 (China Standard Time)

By the way, we have links for downloading trained models' weight in the https://github.com/grammarly/gector#pretrained-models section.

Chunngai Ho · Answer 3 · Tue Sep 21 2021 12:55:17 GMT+0800 (China Standard Time)

@skurzhanskyi
So to train a BERT GEC model the params in each stage should be:

stage 1:
- 20 epochs + 3 patience (as in "Number of epochs and early stopping" in "training_parameters.md")
- Params of "Same parameters for all stages" in "training_parameters.md"
- Params of "Stage1 parameters" in "training_parameters.md"
- special_tokens_fix == 0
stage 2:
- 20 epochs + 3 patience
- Params of "Same parameters for all stages"
- Params of "Stage2 parameters"
- special_tokens_fix == 0
stage 3:
- 20 epochs + 3 patience
- Params of "Same parameters for all stages"
- Params of "Stage3 parameters"
- special_tokens_fix == 0

For prediction the params in each stage should be:

Params of "For prediction during stage1-3 we used" in "training_parameters.md"
special_tokens_fix == 0

And "Same parameters for all stages" is NOT for prediction in each stage.

Is that right?

Alex Skurzhanskyi · Answer 4 · Tue Sep 21 2021 15:48:17 GMT+0800 (China Standard Time)

We had fewer epochs for stage 2 and stage 3, similar to RoBERTa and XLNet.
You're right; we're missing some information for BERT.

Chunngai Ho · Answer 5 · Tue Sep 21 2021 15:55:39 GMT+0800 (China Standard Time)

@skurzhanskyi Ok thanks.
So can I set n_epoch == 20 in the 2nd and 3rd stage, as patience == 3 may stop the training somewhere? Or could you provide epoch info for BERT?

Alex Skurzhanskyi · Answer 6 · Tue Sep 21 2021 16:38:07 GMT+0800 (China Standard Time)

I think yes

Alex Skurzhanskyi · Answer 7 · Tue Sep 21 2021 16:43:11 GMT+0800 (China Standard Time)

Best parameters for final predictions you can find at https://github.com/grammarly/gector#pretrained-models

Chunngai Ho · Answer 8 · Tue Sep 21 2021 16:57:12 GMT+0800 (China Standard Time)

Ok thanks. I'll try it.