oelin / minimal-transformer

A minimal decoder-only transformer implemented in under 50 lines of PyTorch.

Minimal Transformer

A minimal decoder-only transformer implemented in under 50 lines of PyTorch.

Purpose

Implementing the Transformer architecture can be challenging for beginners due to its use of non-trivial information flow (attention, causal masks etc). To this end, we offer a stripped down, "simple as possible" implementation of a decoder-only transformer for pedagogical purposes.

About

A minimal decoder-only transformer implemented in under 50 lines of PyTorch.

Languages

Language:Python 100.0%