labmlai / annotated_deep_learning_paper_implementations

🧑‍🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

Home Page:https://nn.labml.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug in rotary positional embedding

scv11 opened this issue · comments

I have copied the original code. But that has an error. The running result shows that there is a tensor operation exception in this statement.

x_rope = (x_rope * self.cos_cached[:x.shape[0]]) + (neg_half_x * self.sin_cached[:x.shape[0]])

and the error information look like this
RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 3
After debugging, I found that the positional encodings are applied only to a partial set of features(3 in the last dim in this test), but the cos_cached and sin_cached have the same feature dimension as the original x tensor(4 in this test). So there will be error when multiplying by elements. So I think the code should be like this

x_rope = ((x_rope * self.cos_cached[:x.shape[0], :, :, :self.d]) +
          (neg_half_x * self.sin_cached[:x.shape[0], :, :, :self.d]))

If I have any mistakes, please feel free to tell me.

Fixed it here 2236f63

Sorry for the delay