Jonathan Tow's repositories
accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
jon-tow.github.io
My personal website
cc_net
Tools to download and cleanup Common Crawl data
contriever
Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning
CPCargo
A simple package to upload DL checkpoints to remote storage
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
english-wordnet
The Open English WordNet
flash-attention
Fast and memory-efficient exact attention
goodreads
code samples for the goodreads datasets
helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110).
kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Megatron-LLM
distributed trainer for LLMs
ml-engineering
Machine Learning Engineering Guides and Tools
rerope
Rectified Rotary Position Embeddings
ring-flash-attention
Ring attention implementation with flash attention
scattermoe
Triton-based implementation of Sparse Mixture of Experts.
text-dedup
All-in-one text de-duplication
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
triton
Development repository for the Triton language and compiler
zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism