YangjieZhou's starred repositories

ServerlessLLM

Serverless LLM Serving for Everyone

Language:PythonLicense:Apache-2.0Stargazers:273Issues:0Issues:0

Cute-Learning

Examples of CUDA implementations by Cutlass CuTe

Language:MakefileLicense:MITStargazers:70Issues:0Issues:0

torchtitan

A native PyTorch Library for large model training

Language:PythonLicense:BSD-3-ClauseStargazers:2526Issues:0Issues:0

TidalDecode

TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Language:PythonLicense:Apache-2.0Stargazers:19Issues:0Issues:0

ParrotServe

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Language:PythonLicense:MITStargazers:107Issues:0Issues:0

splitwise-sim

LLM serving cluster simulator

Language:Jupyter NotebookLicense:MITStargazers:72Issues:0Issues:0

Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

License:Apache-2.0Stargazers:4878Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:24Issues:0Issues:0

AIOS

AIOS: LLM Agent Operating System

Language:PythonLicense:MITStargazers:3359Issues:0Issues:0

Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Language:CudaLicense:Apache-2.0Stargazers:615Issues:0Issues:0

triton-shared

Shared Middle-Layer for Triton Compilation

Language:MLIRLicense:MITStargazers:177Issues:0Issues:0

triton-tvm

Triton to TVM transpiler.

Language:C++License:NOASSERTIONStargazers:15Issues:0Issues:0

nviwatch

NviWatch: A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU processes

Language:RustLicense:GPL-3.0Stargazers:172Issues:0Issues:0

attention-gym

Helpful tools and examples for working with flex-attention

Language:PythonLicense:BSD-3-ClauseStargazers:433Issues:0Issues:0
Language:PythonStargazers:16Issues:0Issues:0

VideoSys

VideoSys: An easy and efficient system for video generation

Language:PythonLicense:Apache-2.0Stargazers:1740Issues:0Issues:0

ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Language:PythonLicense:Apache-2.0Stargazers:40587Issues:0Issues:0

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Language:PythonLicense:MITStargazers:6621Issues:0Issues:0

JSQ

[ICML 2024] JSQ: Compressing Large Language Models by Joint Sparsification and Quantization

Language:PythonLicense:MITStargazers:148Issues:0Issues:0

wanda

A simple and effective LLM pruning approach.

Language:PythonLicense:MITStargazers:646Issues:0Issues:0

qlib

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.

Language:PythonLicense:MITStargazers:15418Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:2726Issues:0Issues:0

TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Language:C++License:MITStargazers:143Issues:0Issues:0

Spec-Bench

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Language:PythonLicense:Apache-2.0Stargazers:173Issues:0Issues:0

EAGLE

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

Language:PythonLicense:Apache-2.0Stargazers:808Issues:0Issues:0

veScale

A PyTorch Native LLM Training Framework

Language:PythonLicense:Apache-2.0Stargazers:646Issues:0Issues:0

flute

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Language:C++License:Apache-2.0Stargazers:181Issues:0Issues:0
Language:C++License:Apache-2.0Stargazers:17Issues:0Issues:0

SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

License:Apache-2.0Stargazers:421Issues:0Issues:0
Language:PythonStargazers:39Issues:0Issues:0