Distributed Training in PyTorch on ImageNette
This repository contains working code to train on ImageNette using DISTRIBUTED DATA PARALLEL (DDP) in PyTorch and Hugging Face Accelerate.
π€ Accelerate - DOCS | GitHub
For a deep-dive into the HF Accelerate package, refer to Inside Hugging Face's Accelerate!.
To be able to run the scripts, please run the following commands first from the root directory of this repository to download the data:
mkdir data && cd data
wget https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz
tar -xvf imagenette2-160.tgz
Now you should have a data
directory in the repository whose folder structure looks like:
data/
βββ imagenette2-160
βββ train
β βββ n01440764
β βββ n02102040
β βββ n02979186
β βββ n03000684
β βββ n03028079
β βββ n03394916
β βββ n03417042
β βββ n03425413
β βββ n03445777
β βββ n03888257
βββ val
βββ n01440764
βββ n02102040
βββ n02979186
βββ n03000684
βββ n03028079
βββ n03394916
βββ n03417042
βββ n03425413
βββ n03445777
βββ n03888257
Launch training using PyTorch DDP
To launch training using PyTorch DDP, run the following command from the src
folder of this repository:
./ddp.sh <number-of-gpus>
Launch training using Huggingface Accelerate
To launch training using Huggingface Accelerate, run the following command from the src
folder of this repository:
accelerate launch train_accelerate.py