cccpr (brisker)

brisker

Geek Repo

Location:SJTU,China

Github PK Tool:Github PK Tool

cccpr's starred repositories

llama.cpp

LLM inference in C/C++

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:22701Issues:206Issues:3407

mlc-llm

Universal LLM Deployment Engine with ML Compilation

Language:PythonLicense:Apache-2.0Stargazers:17756Issues:167Issues:1174

lantern

Lantern官方版本下载 蓝灯 翻墙 代理 科学上网 外网 加速器 梯子 路由 - Быстрый, надежный и безопасный доступ к открытому интернету - lantern proxy vpn censorship-circumvention censorship gfw accelerator پراکسی لنترن، ضدسانسور، امن، قابل اعتماد و پرسرعت

Language:GoStargazers:14350Issues:496Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:11963Issues:103Issues:872

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:7438Issues:85Issues:1562

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Language:PythonLicense:MITStargazers:5827Issues:37Issues:929

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonLicense:BSD-3-ClauseStargazers:5362Issues:63Issues:93

Baichuan2

A series of large language models developed by Baichuan Intelligent Technology

Language:PythonLicense:Apache-2.0Stargazers:4031Issues:40Issues:386

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonLicense:Apache-2.0Stargazers:3268Issues:30Issues:998

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonLicense:Apache-2.0Stargazers:3236Issues:21Issues:402

GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Language:PythonLicense:Apache-2.0Stargazers:2945Issues:42Issues:216

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonLicense:MITStargazers:1433Issues:11Issues:332

smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Language:PythonLicense:MITStargazers:1102Issues:19Issues:81

test

Measuring Massive Multitask Language Understanding | ICLR 2021

Language:PythonLicense:MITStargazers:1060Issues:20Issues:19

ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.

Language:PythonLicense:NOASSERTIONStargazers:902Issues:113Issues:34

Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

llama-chat

Chat with Meta's LLaMA models at home made easy

Language:PythonLicense:GPL-3.0Stargazers:832Issues:11Issues:34

MQBench

Model Quantization Benchmark

Language:ShellLicense:Apache-2.0Stargazers:739Issues:14Issues:196

QuIP

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

hessian

hessian in pytorch

QUIK

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference

Language:C++License:Apache-2.0Stargazers:161Issues:6Issues:6

torch-int

This repository contains integer operators on GPUs for PyTorch.

Language:PythonLicense:MITStargazers:156Issues:2Issues:21

PB-LLM

PB-LLM: Partially Binarized Large Language Models

Language:PythonLicense:MITStargazers:139Issues:3Issues:5

Outlier_Suppression_Plus

Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

Language:PythonLicense:MITStargazers:35Issues:8Issues:6

ReSTE

Official implementation of Rectified Straight Through Estimator (ReSTE).

AFPQ

AFPQ code implementation

Language:PythonLicense:MITStargazers:15Issues:0Issues:0

llm-mixed-q

mixed-precision quantization for LLMs

Language:PythonLicense:Apache-2.0Stargazers:12Issues:2Issues:0