NicoNico6

Nianhui Guo's repositories

lq-lora

Language:Python100

ANT-Quantization

Language:Python000

AppAgent

Language:PythonMIT000

bitorch-engine

A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.

Language:PythonApache-2.0000

AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf

Apache-2.0000

Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

MIT000

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

MIT000

buffer-of-thought-llm

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

000

ETO

Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents

000

evolutionary-model-merge

Official repository of Evolutionary Optimization of Model Merging Recipes

Apache-2.0000

fast-hadamard-transform

Fast Hadamard transform in CUDA, with a PyTorch interface

BSD-3-Clause000

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

BSD-3-Clause000

hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Apache-2.0000

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

MIT000

KVQuant

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

000

llamafile

Distribute and run LLMs with a single file.

NOASSERTION000

lm-polygraph

MIT000

MiniMA

Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"

Apache-2.0000

mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

MIT000

octopus-v4

AI for all: Build the large graph of the language models

NOASSERTION000

PainlessInferenceAcceleration

CC-BY-4.0000

private_llm

Apache-2.0000

PruneMe

000

Pruner-Zero

Evolving Symbolic Pruning Metric from scratch

MIT000

QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution.

000

Quantize-Watermark

000

quip-sharp

GPL-3.0000

ShiftAddLLM

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Apache-2.0000

transformerlab-app

Experiment with Large Language Models

AGPL-3.0000

UDR

ACL'23: Unified Demonstration Retriever for In-Context Learning

000