kaix90

kaix90

Geek Repo

Company:NVIDIA

Location:Santa Clara, CA

Github PK Tool:Github PK Tool

kaix90's starred repositories

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonLicense:MITStargazers:34971Issues:353Issues:305

tinygrad

You like pytorch? You like micrograd? You love tinygrad! ❤️

Language:PythonLicense:MITStargazers:25220Issues:264Issues:675

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:23298Issues:215Issues:3528

Learn-Vim

Learning Vim and Vimscript doesn't have to be hard. This is the guide that you're looking for 📖

AISystem

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:9657Issues:135Issues:31

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonLicense:BSD-3-ClauseStargazers:5376Issues:64Issues:96

llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonLicense:MITStargazers:2150Issues:24Issues:159

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language:PythonLicense:Apache-2.0Stargazers:2086Issues:36Issues:191

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

yolov5-5.x-annotations

一个基于yolov5-5.0的中文注释版本!

Vehicle-Detection-and-Tracking

Computer vision based vehicle detection and tracking using Tensorflow Object Detection API and Kalman-filtering

Language:PythonLicense:Apache-2.0Stargazers:515Issues:21Issues:21

motpy

Library for tracking-by-detection multi object tracking implemented in python

Language:PythonLicense:MITStargazers:488Issues:19Issues:23

ao

Custom data types and layouts for training and inference

Language:PythonLicense:BSD-3-ClauseStargazers:428Issues:26Issues:88

optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:361Issues:37Issues:76

qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Language:PythonLicense:Apache-2.0Stargazers:350Issues:8Issues:22

TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

Language:PythonLicense:NOASSERTIONStargazers:328Issues:12Issues:44

llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

Language:PythonLicense:Apache-2.0Stargazers:311Issues:8Issues:9

aisys-building-blocks

Building blocks for foundation models.

algorithm-study

Algorithm Notes and Templates (written in python,golang and typescript)

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language:PythonLicense:MITStargazers:173Issues:3Issues:21

applied-ai

Applied AI experiments and examples for PyTorch

Language:PythonLicense:BSD-3-ClauseStargazers:83Issues:11Issues:8

Machine-Learning-Explained

Learn the theory, math and code behind different machine learning algorithms and techniques.

Language:PythonLicense:MITStargazers:59Issues:5Issues:1

EasyKV

Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)

venom

A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

Language:PythonLicense:Apache-2.0Stargazers:27Issues:1Issues:5

Sparse-IFT

Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency

Language:PythonLicense:Apache-2.0Stargazers:18Issues:3Issues:0

Sparse-GPT-Finetuning

Code for my ICLR 2024 TinyPapers paper "Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models"

Language:Jupyter NotebookStargazers:10Issues:0Issues:0