Ferdinand Mom's repositories
3outeille.github.io
My website
kernel-builder
👷 Build compute kernels
ColossalAI
Making large AI models cheaper, faster and more accessible
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
diloco_simple
torch implementation of diloco
DualPipe
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
dust
A Nintendo DS emulator written in Rust for desktop devices and the web, with debugging features and a focus on accuracy
EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
fms-fsdp
Demonstrate throughput of PyTorch FSDP
gpt-oss-recipes
Collection of scripts and notebooks for OpenAI's latest GPT OSS models
kernels
Load compute kernels from the Hub
lighteval
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Megatron-LM
Ongoing research training transformer models at scale
nccl
Optimized primitives for collective multi-GPU communication
nccl-tests
NCCL Tests
peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
picotron-deepseek
Minimalistic 4D-parallelism distributed training framework for education purpose
prime-rl
Decentralized RL Training at Scale
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
quack
A Quirky Assortment of CuTe Kernels
ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
tilelang
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
torchtitan
A native PyTorch Library for large model training
veScale
A PyTorch Native LLM Training Framework