Beast code in Giters

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.02581 86 6

LGM

[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.

Language:PythonMIT1591 34 69

cs249r_book

Collaborative book Machine Learning Systems

Language:TeXNOASSERTION1005 13 227

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, sparsity, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

Language:PythonNOASSERTION457 14 73

tinymembench

Simple benchmark for memory throughput and latency

Language:CMIT350 30 16

Grendel-GS

Ongoing research training gaussian splatting at scale by distributed system

Language:PythonApache-2.0338 17 23

owl

Language:C++Apache-2.0239 8 133

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language:PythonMIT218 5 24

nerfbaselines

Reproducible evaluation of NeRF methods

Language:PythonMIT152 3 9

research-career-tools

Language:PythonMIT135 40

SA-GS

Language:PythonApache-2.099 4 8

ShiftAddLLM

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Language:PythonApache-2.084 3 5

Magicube

Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.

Language:C++GPL-3.080 4 2

Edge-LLM

[DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting

Language:Python23 4 2

ACT

[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Language:Python1600

mg-verilog

Language:PythonMIT400

3D-Carbon

3D-Carbon: An Analytical Carbon Modeling Tool for 3D and 2.5D Integrated Circuits

Language:Jupyter Notebook400

ray-tracing-in-cuda

Language:C++3 10

LogarithmicPosit

[DAC'24] Official Implementation of the Logarithmic Posit (LP) Number System

MIT200

licj15

Chaojian Li's starred repositories

llama3

starter-workflows

tiny-gpu

gpt-fast

cutlass

arxiv-latex-cleaner

dust3r

warp

FluidX3D

DeepSeek-V2

pbrt-v4

Awesome-LLM-Inference

LGM