[nanoChatGPT] weight tying embedding

Question

[nanoChatGPT] weight tying embedding

apbard opened this issue a year ago · comments

Alessandro Pietro Bardelli commented a year ago

shouldn't this be transposed?
i.e. self.transformer.wte.weight = torch.t(self.lm_head.weight)

Alessandro Pietro Bardelli · Answer 1 · Tue Apr 04 2023 21:15:34 GMT+0800 (China Standard Time)

the weights should be one the transposed of the other, but nn.Embedding and Linear already stores the weight differently so the weights are already in the right shape