THUDM / GLM

GLM (General Language Model)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to set hyperparameters during pretraining glm_doc?

ymr12 opened this issue · comments

commented
  1. During pretraining glm_doc, what's ratio of blank filling and document-level generation? Paper and code both do not illustrate this, I tried 5:5 and 3:7, but the ppl is always triple as the indication of paper. So I want to know how the relevant hyperparameters are set?
  2. In my practice, it will overfit in 30k~40k iterations with batchsize=512. Is there any solution?