A tiny decoder only transformer as described in the Attention Is All You Need paper for fun and educational purposes.
- residual connections
- layer normalization
Inspired by the work of A. Karpathy: https://github.com/karpathy/minGPT.
A tiny decoder only transformer as described in the Attention Is All You Need paper for fun and educational purposes.
Inspired by the work of A. Karpathy: https://github.com/karpathy/minGPT.
MIT License