Can you help to confirm if chatglm3 model is same as GPT or it's original from GLM architecture?

Question

tiendung opened this issue 8 months ago · comments

From the source code sat/model/official/chatglm3_model.py I cannot find 2D positional encoding.

Qingsong Lv · Answer 1 · Fri Nov 17 2023 14:31:44 GMT+0800 (China Standard Time)

Yes, chatglm3 uses multiplicative 1d rotary position. But it is not same as GPT, because GPT uses additive absolute position embedding.

Alex Nguyen · Answer 2 · Fri Nov 17 2023 15:28:05 GMT+0800 (China Standard Time)

Yes, chatglm3 uses multiplicative 1d rotary position. But it is not same as GPT, because GPT uses additive absolute position embedding.

So chatglm3 was trained to predict next token only (without filling blanks ...)?

Qingsong Lv · Answer 3 · Fri Nov 17 2023 15:30:12 GMT+0800 (China Standard Time)

I'm not sure. I didn't work on it and I just transformed the model weight into SAT.