zhengzangw / pytorch-cifar

95.16% on CIFAR10 with PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tutorial: Train CIFAR10 with PyTorch on NERSC Cori-GPU

Here is a tutorial how to train deep learning models on the CIFAR10 dataset on Cori-GPU platform using PyTorch.

Submit an interactive job

First you'd need to request one or more GPU using the following script. See this page for further details.

module load esslurm
salloc -C gpu -N 1 -t 60 -c 10 -G 1 -A m3691

Then run the following commands to kick off training.

module load pytorch/v1.5.0-gpu
srun python main.py

Submit a batch job

Run the following commands for submitting a batch job.

sbatch train_cgpu.sh

The dashboard on my.nersc.gov sometimes cannot correctly display jobs running on the GPU cluster, so a better way is to run jobstats in the terminal to view the job status. When the job starts running, its status will change from PENDING to RUNNING.

In the batch mode, the results will be redirected to <job_id>.out, under your working directory by default.

Continously run on NERSC

Run the following command for continuously training on NERSC

python train_nersc.py --name cifar --interval 60 > cifar.log &

The interval is # minutes between two status checking for re-launch.

To quickly test the script's validity, try setting time in train_cgpu.sh to be 3 minutes and run

python train_nersc.py --interval 1

You can build your own script based on this one.


Prerequisites

  • Python 3.6+
  • PyTorch 1.0+

Accuracy

Model Acc.
VGG16 92.64%
ResNet18 93.02%
ResNet50 93.62%
ResNet101 93.75%
RegNetX_200MF 94.24%
RegNetY_400MF 94.29%
MobileNetV2 94.43%
ResNeXt29(32x4d) 94.73%
ResNeXt29(2x64d) 94.82%
DenseNet121 95.04%
PreActResNet18 95.11%
DPN92 95.16%

Learning rate adjustment

I manually change the lr during training:

  • 0.1 for epoch [0,150)
  • 0.01 for epoch [150,250)
  • 0.001 for epoch [250,350)

Resume the training with python main.py --resume --lr=0.01

About

95.16% on CIFAR10 with PyTorch

License:MIT License


Languages

Language:Python 99.5%Language:Shell 0.5%