Nianhui Guo's repositories
BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf
ColossalAI
Making big AI models cheaper, easier, and scalable
FastChat
The release repo for "Vicuna: An Open Chatbot Impressing GPT-4"
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
hqq
Official implementation of Half-Quadratic Quantization (HQQ)
KIVI
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Lion-vs-Adam
Lion and Adam optimization comparison
LLaMA-Efficient-Tuning
Easy-to-use fine-tuning framework using PEFT (PT+SFT+RLHF with QLoRA)
LLM-Pruner
LLM-Pruner: On the Structural Pruning of Large Language Models
MiniMA
Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"
mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
MS-AMP
Microsoft Automatic Mixed Precision Library
NBCE
Naive Bayes-based Context Extension
OmniQuant
OmniQuant is a simple and powerful quantization technique for LLMs.
QIGen
Repository for CPU Kernel Generation for LLM Inference
qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
QuIP-for-Llama
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models
soft-prompt-tuning
Prompt tuning for GPT-J
tiger
A Tight-fisted Optimizer
torch-int
This repository contains integer operators on GPUs for PyTorch.