grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Advice about training with additional synthetic dataset

rachelwrr opened this issue · comments

Hi,

Thanks for the work!

Just seeking for advice. If I want to feed in with additional synthetic data set targeting a few specific grammar errors, what order will you recommend me to train the model? Will mixing up the order of 3 training stages affect the result?

Fine tune on the top of your pretrained model (after Stage 3)?
Or
Restart the training process, and include those new dataset in Stage 1?

I'm new in this area. Any advice will be appreciated :)

Thanks!

Hi
I think this depends on how much your errors differ from those in the dataset. In general, I would suggest adding these errors to Stage 1 and then applying Stage 2 & 3, as your data is synthetic.

errors to Stage 1 and then applying Stage 2 & 3, as your data is synthetic.

Thanks for the reply! For dataset, I took 60000 sentences from PIE folder a5 (true), then convert adj to adv, intending to improve adj. / adv. conversion related grammar errors.