labmlai / annotated_deep_learning_paper_implementations

🧑‍🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

Home Page:https://nn.labml.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug in implementation of Rotary Positional Embeddings

Inkorak opened this issue · comments

If you run this example code, there will be a bug.
Error:

x_rope = (x_rope * self.cos_cached[:x.shape[0]]) + (neg_half_x * self.sin_cached[:x.shape[0]])
          ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 3

It seems the problem is in the incorrect implementation of dividing features for the use of ROPE only into their parts.

The correct code should most likely be something like this:

x_rope = (x_rope * self.cos_cached[:, :, :, :x_rope.shape[0]]) + (neg_half_x * self.sin_cached[:, :, :, :x_rope.shape[0]])

i agree that line is wrong but i thought it should be
x_rope = (x_rope * self.cos_cached[...,:self.d) + (neg_half_x * self.sin_cached[...,:self.d])

if you disagree, please explain more! i want to know!!!