Making GPT-2 from scratch with modifications of ROPE, sliding window attention and grouped query attention
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool