How to set hyperparameters during pretraining glm_doc?

Question

ymr12 opened this issue a year ago · comments

During pretraining glm_doc, what's ratio of blank filling and document-level generation? Paper and code both do not illustrate this, I tried 5:5 and 3:7, but the ppl is always triple as the indication of paper. So I want to know how the relevant hyperparameters are set?
In my practice, it will overfit in 30k~40k iterations with batchsize=512. Is there any solution?