Inference llama2 in one file of pure Rust

To run

Download the weights

wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin

Build the binary

rustc run.rs -C opt-level=3 -o run.out

Run inference

./run.out stories15M.bin

Currently the performance is shit, it's generating ~5tok/s on my M1 Max

Commit	Tok / s	Remarks
3774d76	111	Use 1-D vector instead of multi dimension vector
1a3b83e	75	build with opt-level=3
ab08f7e	5.2	The very first version

minimal llama2 in rust

Language:Rust 99.7%Language:Shell 0.3%