grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about parameters.

Chunngai opened this issue · comments

Hi. Thanks for your great work and for your kindly releasing the source code. But I have some questions.
i) What's the parameters used (common & specific)? In "training_parameters.md" you only mentioned params for XLNet and RoBERTA.
ii) I should set special_tokens_fix to 0 for the BERT model, right? You have not mentioned that in the "training_parameters.md" file (even for the XLNet model).

Hi @Chunngai
In training_parameters.md for each stage, we show parameters for BERT, and for RoBERTa and XLNet, we show model-specific params. For BERT, spetial_tokens_fix is 0.

By the way, we have links for downloading trained models' weight in the https://github.com/grammarly/gector#pretrained-models section.

@skurzhanskyi
So to train a BERT GEC model the params in each stage should be:

  1. stage 1:
    • 20 epochs + 3 patience (as in "Number of epochs and early stopping" in "training_parameters.md")
    • Params of "Same parameters for all stages" in "training_parameters.md"
    • Params of "Stage1 parameters" in "training_parameters.md"
    • special_tokens_fix == 0
  2. stage 2:
    • 20 epochs + 3 patience
    • Params of "Same parameters for all stages"
    • Params of "Stage2 parameters"
    • special_tokens_fix == 0
  3. stage 3:
    • 20 epochs + 3 patience
    • Params of "Same parameters for all stages"
    • Params of "Stage3 parameters"
    • special_tokens_fix == 0

For prediction the params in each stage should be:

  • Params of "For prediction during stage1-3 we used" in "training_parameters.md"
  • special_tokens_fix == 0

And "Same parameters for all stages" is NOT for prediction in each stage.

Is that right?

We had fewer epochs for stage 2 and stage 3, similar to RoBERTa and XLNet.
You're right; we're missing some information for BERT.

@skurzhanskyi Ok thanks.
So can I set n_epoch == 20 in the 2nd and 3rd stage, as patience == 3 may stop the training somewhere? Or could you provide epoch info for BERT?

I think yes

Best parameters for final predictions you can find at https://github.com/grammarly/gector#pretrained-models

Ok thanks. I'll try it.