Griffin Jax

A hybrid model that mixes gated linear recurrences with local attention.

Griffin-3B outperforms Mamba-3B, and Griffin-7B and Griffin-14B achieve performance competitive with Llama-2, despite being trained on nearly 7 times fewer tokens.
Griffin can extrapolate on sequences significantly longer than those seen during training.

[] Usage and training code will be added to the repository.

About

Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"

Apache License 2.0

Language:Python 91.8%Language:Shell 6.5%Language:Makefile 1.7%