YangjieZhou's starred repositories
ServerlessLLM
Serverless LLM Serving for Everyone
Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
torchtitan
A native PyTorch Library for large model training
TidalDecode
TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
splitwise-sim
LLM serving cluster simulator
Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
triton-shared
Shared Middle-Layer for Triton Compilation
triton-tvm
Triton to TVM transpiler.
attention-gym
Helpful tools and examples for working with flex-attention
ChatGLM-6B
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
qlib
Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
SpeculativeDecodingPapers
📰 Must-read papers and blogs on Speculative Decoding ⚡️