Geek Repo
Github PK Tool:Github PK Tool
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
Public repo for HF blog posts
Webpage for FSA
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Conference talks given by FSA Lab, University of Sydney