D-PSGD
Algorithm: Decentralized Parallel Stochastic Gradient Descent
Requirements
- Install PyTorch (pytorch.org)
- GPU clusters with OpenMPI to communicate
Training
A 20-layer ResNet model and Cifar10 dataset are choosed for evaluation. Use the code bellow to start a training process on 1 coordinator node and 4 training nodes.
mpirun -n 5 --hostfile hosts python PSGD.py --epochs 160 --lr 0.5