Alex Wang's starred repositories
Perplexica
Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
torchtitan
A native PyTorch Library for large model training
ThunderKittens
Tile primitives for speedy kernels
flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
conditional-flow-matching
TorchCFM: a Conditional Flow Matching library
data-to-paper
data-to-paper: Backward-traceable AI-driven scientific research
SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
Annotated-ML-Papers
Annotations of the interesting ML papers I read
GPUSorting
State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
FlashAttention-PyTorch
Implementation of FlashAttention in PyTorch
GPUPrefixSums
A nearly complete collection of prefix sum algorithms implemented in CUDA, D3D12, Unity and WGPU. Theoretically portable to all wave/warp/subgroup sizes.
gpu-prefix-sum
CUDA implementation of exclusive prefix sum via Blelloch's algorithm
Pytorch-Depthwise-Conv3d
cuda implementation of depthwise conv3d
Parallel-Computing
Implementation of various Parallel Computing algorithms using CUDA C++