Why does we need to train for Stage II and Stage III? And why not just train for one stage on the annotated dataset?
chengyang00 opened this issue · comments
Cheng Yang commented
I want to know why doing this can improve the performance. Thanks!
Abhinav Dayal commented
My understanding. Stage 1 is synthetic data which is also huge in size, so training is done on that. Stage 2 and 3 use manually annotated and accurate data with the kind of errors humanly made. The data size is tiny compared to the synthetic data. Thus they call it fine tuning and not training.
Alex Skurzhanskyi commented
Thanks for answering this. You're right – different stages have data of different quality.