Beast code in Giters

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.02030 79 4

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Language:Python977 38 1

Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

MIT896 34 3

Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

MIT634 25 4

yolov5-5.x-annotations

一个基于yolov5-5.0的中文注释版本！

Language:Python630 4 7

Vehicle-Detection-and-Tracking

Computer vision based vehicle detection and tracking using Tensorflow Object Detection API and Kalman-filtering

Language:Python534 19 34

motpy

Library for tracking-by-detection multi object tracking implemented in python

Language:PythonMIT488 19 23

ao

Custom data types and layouts for training and inference

Language:PythonBSD-3-Clause428 26 88

optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

Language:Jupyter NotebookApache-2.0361 37 76

qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Language:PythonApache-2.0350 8 22

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

Language:PythonNOASSERTION328 12 44

kaix90

kaix90's starred repositories

nanoGPT

tinygrad

vllm

Learn-Vim

AISystem

gpt-fast

llm-awq

neural-compressor

Awesome-LLM-Inference

Awesome-Efficient-LLM

Awesome-LLM-Compression

Awesome-LLM-Long-Context-Modeling

yolov5-5.x-annotations

Vehicle-Detection-and-Tracking

SpQR

motpy

ao

optimum-intel

qserve

TensorRT-Model-Optimizer

llm-analysis

aisys-building-blocks

algorithm-study

KIVI

applied-ai

Machine-Learning-Explained

EasyKV

venom

Sparse-IFT

Sparse-GPT-Finetuning