THUDM / SwissArmyTransformer

SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.

Home Page:https://THUDM.github.io/SwissArmyTransformer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can you help to confirm if chatglm3 model is same as GPT or it's original from GLM architecture?

tiendung opened this issue · comments

From the source code sat/model/official/chatglm3_model.py I cannot find 2D positional encoding.

Yes, chatglm3 uses multiplicative 1d rotary position. But it is not same as GPT, because GPT uses additive absolute position embedding.

Yes, chatglm3 uses multiplicative 1d rotary position. But it is not same as GPT, because GPT uses additive absolute position embedding.

So chatglm3 was trained to predict next token only (without filling blanks ...)?

I'm not sure. I didn't work on it and I just transformed the model weight into SAT.