Beast code in Giters

cccpr's starred repositories

llama.cpp

LLM inference in C/C++

Language:C++MIT61477 522 3346

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.022701 206 3407

mlc-llm

Universal LLM Deployment Engine with ML Compilation

Language:PythonApache-2.017756 167 1174

Lantern官方版本下载蓝灯翻墙代理科学上网外网加速器梯子路由 - Быстрый, надежный и безопасный доступ к открытому интернету - lantern proxy vpn censorship-circumvention censorship gfw accelerator پراکسی لنترن، ضدسانسور، امن، قابل اعتماد و پرسرعت

Language:Go14350 4960

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause11963 103 872

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.07438 85 1562

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Language:PythonMIT5827 37 929

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonBSD-3-Clause5362 63 93

Baichuan2

A series of large language models developed by Baichuan Intelligent Technology

Language:PythonApache-2.04031 40 386

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonApache-2.03268 30 998

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonApache-2.03236 21 402

GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Language:PythonApache-2.02945 42 216

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonMIT1433 11 332

smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Language:PythonMIT1102 19 81

test

Measuring Massive Multitask Language Understanding | ICLR 2021

Language:PythonMIT1060 20 19

ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.

Language:PythonNOASSERTION902 113 34

Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

MIT869 34 3

llama-chat

Chat with Meta's LLaMA models at home made easy

Language:PythonGPL-3.0832 11 34

MQBench

Model Quantization Benchmark

Language:ShellApache-2.0739 14 196

QuIP

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Language:Python316 9 10

hessian

hessian in pytorch

Language:Python184 6 2

QUIK

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference

Language:C++Apache-2.0161 6 6

torch-int

This repository contains integer operators on GPUs for PyTorch.

Language:PythonMIT156 2 21

INT8-Flash-Attention-FMHA-Quantization

Language:Cuda148 5 5

PB-LLM

PB-LLM: Partially Binarized Large Language Models

Language:PythonMIT139 3 5

ANT-Quantization

Language:Python69 4 5

Outlier_Suppression_Plus

Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

Language:PythonMIT35 8 6

ReSTE

Official implementation of Rectified Straight Through Estimator (ReSTE).

Language:Python22 2 1

AFPQ

AFPQ code implementation

Language:PythonMIT1500

llm-mixed-q

mixed-precision quantization for LLMs

Language:PythonApache-2.012 20