kaix90's starred repositories
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
yolov5-5.x-annotations
一个基于yolov5-5.0的中文注释版本!
Vehicle-Detection-and-Tracking
Computer vision based vehicle detection and tracking using Tensorflow Object Detection API and Kalman-filtering
optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
aisys-building-blocks
Building blocks for foundation models.
algorithm-study
Algorithm Notes and Templates (written in python,golang and typescript)
applied-ai
Applied AI experiments and examples for PyTorch
Machine-Learning-Explained
Learn the theory, math and code behind different machine learning algorithms and techniques.
Sparse-IFT
Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency
Sparse-GPT-Finetuning
Code for my ICLR 2024 TinyPapers paper "Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models"