GLM-130B 模型结构超参问题

Question

GLM-130B 模型结构超参问题

peiyingxin opened this issue a year ago · comments

GLM-130B在设置模型超参时，ffn_hidden_size=12288 attention_head=96 layers=70，LLaMA-65B 模型超参 ffn_hidden_size=8192 attention_head=64 layers=80, GLM-130B似乎更宽，业界主流模型似乎更深？请问GLM-130B模型设计时是出于什么考虑选用这个超参的呢？谢谢！