Chaojian Li's starred repositories

Fov-3DGS

Official Implementation of RTGS: Enabling Real-Time Gaussian Splatting on Mobile Devices Using Efficiency-Guided Pruning and Foveated Rendering.

Language:PythonLicense:MITStargazers:34Issues:0Issues:0

ServerlessLLM

Cost-efficient and fast multi-LLM serving.

Language:PythonLicense:Apache-2.0Stargazers:155Issues:0Issues:0

ThunderKittens

Tile primitives for speedy kernels

Language:CudaLicense:MITStargazers:1449Issues:0Issues:0

grayskull-attention

Attention in SRAM on Tenstorrent Grayskull :metal:

Language:TeXStargazers:14Issues:0Issues:0

QuaRot

Code for QuaRot, an end-to-end 4-bit inference of large language models.

Language:PythonLicense:Apache-2.0Stargazers:240Issues:0Issues:0

unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:14547Issues:0Issues:0

ac_math

Algorithmic C Math Library

Language:C++License:Apache-2.0Stargazers:58Issues:0Issues:0

LLM4HWDesign_Starting_Toolkit

LLM4HWDesign Starting Toolkit

Language:PythonStargazers:14Issues:0Issues:0

ACT

[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Language:PythonStargazers:10Issues:0Issues:0

Grendel-GS

Ongoing research training gaussian splatting at scale by distributed system

Language:PythonLicense:Apache-2.0Stargazers:300Issues:0Issues:0

cs249r_book

Collaborative book Machine Learning Systems

Language:TeXLicense:NOASSERTIONStargazers:750Issues:0Issues:0

3D-Carbon

3D-Carbon: An Analytical Carbon Modeling Tool for 3D and 2.5D Integrated Circuits

Language:Jupyter NotebookStargazers:4Issues:0Issues:0

LogarithmicPosit

[DAC'24] Official Implementation of the Logarithmic Posit (LP) Number System

License:MITStargazers:2Issues:0Issues:0

Edge-LLM

[DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting

Language:PythonStargazers:17Issues:0Issues:0

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

License:MITStargazers:3269Issues:0Issues:0

ShiftAddLLM

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Language:PythonLicense:Apache-2.0Stargazers:61Issues:0Issues:0

arxiv-latex-cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Language:PythonLicense:Apache-2.0Stargazers:5146Issues:0Issues:0

TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

Language:PythonLicense:NOASSERTIONStargazers:390Issues:0Issues:0
Language:PythonLicense:MITStargazers:23Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:2280Issues:0Issues:0

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

Language:SystemVerilogStargazers:6829Issues:0Issues:0

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:25644Issues:0Issues:0

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonLicense:BSD-3-ClauseStargazers:5464Issues:0Issues:0
Language:C++License:Apache-2.0Stargazers:236Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:92Issues:0Issues:0
Language:PythonLicense:MITStargazers:133Issues:0Issues:0
Language:C++Stargazers:3Issues:0Issues:0

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:5205Issues:0Issues:0

warp

A Python framework for high performance GPU simulation and graphics

Language:PythonLicense:NOASSERTIONStargazers:4016Issues:0Issues:0

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language:PythonLicense:MITStargazers:191Issues:0Issues:0