Sukjun Hwang's starred repositories
ThunderKittens
Tile primitives for speedy kernels
torchtitan
A native PyTorch Library for large model training
lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Score-Entropy-Discrete-Diffusion
[ICML 2024 Oral] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)
resource-stream
CUDA related news and material links
pytorch-lightning
Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
contrastors
Train Models Contrastively in Pytorch
visualwebarena
VisualWebArena is a benchmark for multimodal agents.
flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
awesome-ssm-ml
Reading list for research topics in state-space models
LLM-Training-Puzzles
What would you do with 1000 H100s...
GPU-Puzzles
Solve puzzles. Learn CUDA.
Tensor-Puzzles
Solve puzzles. Improve your pytorch.