dominiquegarmier / grok-pytorch

pytorch implementation of grok

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

grok-pytorch

my best attempt of implementing grok in pytorch

What is Grok?

grok or grok-1 is a newly opensourced mixture of experts language model by xai.

Disclaimer

  • This implementation is not intended to be run. It is intended to be a reference for understanding the architecture of the grok model, which is also the reason I wrote this. Personally I also find it easier to reason about a model architecture when shapes are provided via type hints.
  • There are probably still many bugs in this implementation. I have not tested it extensively. And it's also possible that I have missed aspects of the architecture. However it should give you an idea of how grok works.

Attributions

Citations

RoFormer: Enhanced Transformer with Rotary Position Embedding

@misc{su2023roformer,
      title={RoFormer: Enhanced Transformer with Rotary Position Embedding},
      author={Jianlin Su and Yu Lu and Shengfeng Pan and Ahmed Murtadha and Bo Wen and Yunfeng Liu},
      year={2023},
      eprint={2104.09864},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Attention Is All You Need

@misc{vaswani2017attention,
      title={Attention Is All You Need},
      author={Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
      year={2017},
      eprint={1706.03762},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Root Mean Square Layer Normalization

@misc{zhang2019root,
      title={Root Mean Square Layer Normalization},
      author={Biao Zhang and Rico Sennrich},
      year={2019},
      eprint={1910.07467},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

About

pytorch implementation of grok

License:MIT License


Languages

Language:Python 100.0%