srush / llama2.rs

A fast llama2 decoder in pure Rust.

srush/llama2.rs Issues

Support fast GPU processing with Triton
Updated 2 months ago
Speed comparison
Updated 6 months ago2
The generation speed is superb, while the context was being truncated.
Closed 10 months ago18
Exported Models do not load
Closed 10 months ago2
Where is the requirements.export.txt?
Closed 10 months ago2
Tensor has shape torch.Size([448, 1024]) ... this looks incorrect.
Updated a year ago9
How to run baby llama?
Updated a year ago1
CodeLlama support
Updated a year ago1
no `TransformerWeights` in `model`
Closed a year ago3
fabulous, does it support llama 1 and its derivatives anyway?
Updated a year ago2
Quick Code Review: Auto-vectorization
Updated a year ago8
why qzeros need added 1 when unmasked?
Closed a year ago2
Non-mmap'ed weights
Closed a year ago2
License?
Closed a year ago1
readme commands doesn't work
Closed a year ago1
Some llama2 finetunes don't seem to work
Updated a year ago2
Python Versions
Closed a year ago4
Unable to export LLaMa2 model to bin file
Updated a year ago1
Quick review
Updated a year ago2
Minor Nitpics, from an also rust newbie :)
Updated a year ago1
nice work, some questions
Updated a year ago3