Cybertron AI's repositories
gradient-checkpointing
Make huge neural nets fit in memory
imagenet18_old
Code to reproduce "imagenet in 18 minutes" DAWN-benchmark entry
pytorch-lamb
Implementation of https://arxiv.org/abs/1904.00962
transformer-xl
Training Transformer-XL on 128 GPUs
pytorch-sso
PyTorch-SSO: Scalable Second-Order methods in PyTorch
imagenet18
Train ImageNet in 18 minutes on AWS
Megatron-LM
Ongoing research training transformer language models at scale, including: BERT
pytorch-fd
Implementation of fluctuation dissipation relations for automatic learning rate annealing.
aws-network-benchmarks
Tools to benchmark AWS network performance
pytorch-aws
Example code for "PyTorch on AWS made easy"