How to set hyperparameters during pretraining glm_doc?
ymr12 opened this issue · comments
ymr12 commented
- During pretraining glm_doc, what's ratio of blank filling and document-level generation? Paper and code both do not illustrate this, I tried 5:5 and 3:7, but the ppl is always triple as the indication of paper. So I want to know how the relevant hyperparameters are set?
- In my practice, it will overfit in 30k~40k iterations with batchsize=512. Is there any solution?