PotatoSpudowski / fastLLaMa

fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backend.

Home Page:https://potatospudowski.github.io/fastLLaMa/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PotatoSpudowski/fastLLaMa Issues