jingxu10 / Resnet50_DDP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Demo of resnet50_ddp with PyTorch

Distributed training (Distributed Data Parallel) demo with Resnet50.

  • If you would like to check how to run this demo in Intel(R) DevCloud, please checkout devcloud branch.

How to run

  1. Run with torch.distributed.launch script
python -m torch.distributed.launch --nproc_per_node=2 resnet_ddp.py
  1. Run with torchrun
torchrun --nproc_per_node=2 resnet_ddp.py
  1. Run with IPEX launch script
source /opt/intel/oneapi/mpi/latest/env/vars.sh
python launch.py --distributed --nproc_per_node 2 resnet_ddp.py
  1. Run with Horovod
horovodrun -np 2 python resnet_ddp.py

Set backend

python resnet_ddp.py --backend [ccl|nccl|gloo|...]

About


Languages

Language:Python 100.0%