Cheng Luo's repositories
Pensieve-PPO
The simplest implementation of Pensieve (SIGCOMM' 17) via state-of-the-art RL algorithms, including PPO, DQN, and SAC
FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
LASP
Linear Attention Sequence Parallelism (LASP)
neuraloperator
Learning in infinite dimension with neural operators.
Open-Sora-old
Building your own video generation model like OpenAI's Sora
OpenDiT
OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference
SIMPLE
Selfplay In MultiPlayer Environments
Speculative-Sampling
Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind
streaming-llm
Efficient Streaming Language Models with Attention Sinks
tensorly
TensorLy: Tensor Learning in Python.
tltorch
TensorLy-Torch: Deep Tensor Learning with TensorLy and PyTorch
triton
Development repository for the Triton language and compiler
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs