pico-llama: Inside everything associated to LLMs

So LLMs are practically everywhere right know and most of us do know how they work but we don't understand how they work. Yes, there is a difference...That's why I'm making this repo to compile and implement everything and every technique associated to LLMs into one repo. Basically making a clusterfuck of everything that is associated with (L)LM Training and Serving. We'll use a LLama-like model for the whole thing, like a minimal af LLama.

To-Do:

Model:

Add Arxiv Dataset
Add RoPE Embedding Module
Add PicoLLama Model

Training:

Add Training Script
Add QLoRA
Add Distributed Training
Add Mixed-Precision Training

Serving:

Add KV Caching
Add Speculative Decoding
Add Quantization
Add Flash Attention v2

krypticmouse / pico-llama

pico-llama: Inside everything associated to LLMs

To-Do:

About