An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool