kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About rope embedding

eyuansu62 opened this issue · comments

why the Rotary position encodings (RoPE) was applied to 64 dimensions of each head rather full dimensions.