fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backend.
Home Page:https://potatospudowski.github.io/fastLLaMa/
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool