Model parallel transformers in JAX and Haiku
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
eyuansu62 opened this issue 9 months ago · comments
why the Rotary position encodings (RoPE) was applied to 64 dimensions of each head rather full dimensions.