Nianhui Guo's repositories
bitorch-engine
A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.
AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf
Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
buffer-of-thought-llm
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents
evolutionary-model-merge
Official repository of Evolutionary Optimization of Model Merging Recipes
fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
hqq
Official implementation of Half-Quadratic Quantization (HQQ)
KIVI
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
KVQuant
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
llamafile
Distribute and run LLMs with a single file.
MiniMA
Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"
mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
octopus-v4
AI for all: Build the large graph of the language models
Pruner-Zero
Evolving Symbolic Pruning Metric from scratch
QQQ
QQQ is an innovative and hardware-optimized W4A8 quantization solution.
ShiftAddLLM
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
transformerlab-app
Experiment with Large Language Models
UDR
ACL'23: Unified Demonstration Retriever for In-Context Learning