Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool