About the number of epochs?

Question

About the number of epochs?

mangozy opened this issue 3 years ago · comments

How many epochs did u run in order to obtain the accuracy/error you claimed? I haven't found you have documented that anywhere. If you could explain your training policy a little bit more?

Wenxuan Zeng · Answer 1 · Sun Jan 01 2023 23:34:24 GMT+0800 (China Standard Time)

I have read other issues in this repo and I find the number of epochs is set to 200 by default.
Other training details are:
batch-size=128, learning-rate=0.1, momentum=0.9, weight-decay=1e-4, optimizer='SGD', criterion='CrossEntropyLoss', lr-scheduler='MultiStepLR' (milestones=[100, 150]), simple data augmentation: RandomHorizontalFlip(p=0.5) and RandomCrop(size32, padding=4), adopt weight initialization, adopt BN, no dropout!!!
As we can see, this implementation strictly follows the "4.2. CIFAR-10 and Analysis" section in the original paper (i.e., ResNet). Great work!

Jonna Matthiesen · Answer 2 · Thu Mar 30 2023 15:51:54 GMT+0800 (China Standard Time)

Regarding the total number of epochs and the LR schedule the paper states that

"We start with a learning
rate of 0.1, divide it by 10 at 32k and 48k iterations, and
terminate training at 64k iterations, which is determined on
a 45k/5k train/val split"

Given, that 64000 iterations roughly translate to 180 epochs (see reasoning below), it would be more accurate to use those epochs with the following LR steps for the scheduler: [90, 135].
Since 45000/128 = 351.56 (i.e. 351 iterations per epoch) and 64000/351.56 = 182.05
Conversion to epochs: iterations / (n_observations/batch_size)

I also often see 160 epochs with [80, 120].
Since 50000/128 = 390.63 and 64000/390.63 = 163.84