Advice about training with additional synthetic dataset
rachelwrr opened this issue · comments
Hi,
Thanks for the work!
Just seeking for advice. If I want to feed in with additional synthetic data set targeting a few specific grammar errors, what order will you recommend me to train the model? Will mixing up the order of 3 training stages affect the result?
Fine tune on the top of your pretrained model (after Stage 3)?
Or
Restart the training process, and include those new dataset in Stage 1?
I'm new in this area. Any advice will be appreciated :)
Thanks!
Hi
I think this depends on how much your errors differ from those in the dataset. In general, I would suggest adding these errors to Stage 1 and then applying Stage 2 & 3, as your data is synthetic.
errors to Stage 1 and then applying Stage 2 & 3, as your data is synthetic.
Thanks for the reply! For dataset, I took 60000 sentences from PIE folder a5 (true), then convert adj to adv, intending to improve adj. / adv. conversion related grammar errors.