akamaster / pytorch_resnet_cifar10

Proper implementation of ResNet-s for CIFAR10/100 in pytorch that matches description of the original paper.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About the number of epochs?

mangozy opened this issue · comments

How many epochs did u run in order to obtain the accuracy/error you claimed? I haven't found you have documented that anywhere. If you could explain your training policy a little bit more?

I have read other issues in this repo and I find the number of epochs is set to 200 by default.
Other training details are:
batch-size=128, learning-rate=0.1, momentum=0.9, weight-decay=1e-4, optimizer='SGD', criterion='CrossEntropyLoss', lr-scheduler='MultiStepLR' (milestones=[100, 150]), simple data augmentation: RandomHorizontalFlip(p=0.5) and RandomCrop(size32, padding=4), adopt weight initialization, adopt BN, no dropout!!!
As we can see, this implementation strictly follows the "4.2. CIFAR-10 and Analysis" section in the original paper (i.e., ResNet). Great work!

Regarding the total number of epochs and the LR schedule the paper states that

"We start with a learning
rate of 0.1, divide it by 10 at 32k and 48k iterations, and
terminate training at 64k iterations, which is determined on
a 45k/5k train/val split"

Given, that 64000 iterations roughly translate to 180 epochs (see reasoning below), it would be more accurate to use those epochs with the following LR steps for the scheduler: [90, 135].
Since 45000/128 = 351.56 (i.e. 351 iterations per epoch) and 64000/351.56 = 182.05
Conversion to epochs: iterations / (n_observations/batch_size)

I also often see 160 epochs with [80, 120].
Since 50000/128 = 390.63 and 64000/390.63 = 163.84