LLaMA from First Principles

I wanted to learn more about how transfomers worked, so I spent a night hacking at an implementation of LLaMA from scratch.

No ML frameworks. No BLAS. Minimal abstractions. Clarity over performance.

I haven't finished it, but it's probably more than half way finished.

Remaining pieces

MIT License

Language:Rust 100.0%