Alex's starred repositories
ml-engineering
Machine Learning Engineering Open Book
mistral-inference
Official inference library for Mistral models
moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
dilated-attention-pytorch
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)
dilated-self-attention
Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
TorchIntegral
Integral Neural Networks in PyTorch
Segment-and-Track-Anything
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.
web-stable-diffusion
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
FasterTransformer
Transformer related optimization, including BERT, GPT
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
FLASHQuad_pytorch
FLASHQuad_pytorch
diffusion_distiller
🚀 PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)"
FLASH-pytorch
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
guided-inpainting
Towards Unified Keyframe Propagation Models
flash-attention
Fast and memory-efficient exact attention