Can you help to confirm if chatglm3 model is same as GPT or it's original from GLM architecture?
tiendung opened this issue · comments
From the source code sat/model/official/chatglm3_model.py I cannot find 2D positional encoding.
Yes, chatglm3 uses multiplicative 1d rotary position. But it is not same as GPT, because GPT uses additive absolute position embedding.
Yes, chatglm3 uses multiplicative 1d rotary position. But it is not same as GPT, because GPT uses additive absolute position embedding.
So chatglm3 was trained to predict next token only (without filling blanks ...)?
I'm not sure. I didn't work on it and I just transformed the model weight into SAT.