Tiancheng Chen's starred repositories
ThunderKittens
Tile primitives for speedy kernels
float8_experimental
This repository contains the experimental PyTorch native float8 training UX
multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
torchtitan
A native PyTorch Library for large model training
awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
microxcaling
PyTorch emulation library for Microscaling (MX)-compatible data formats
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
long-context-attention
Sequence Parallel Attention for Long Context LLM Model Training and Inference
ring-flash-attention
Ring attention implementation with flash attention
ml-engineering
Machine Learning Engineering Open Book
superbenchmark
A validation and profiling tool for AI infrastructure
Megatron-LM
Ongoing research training transformer models at scale