1cycle Policy. Unfamiliar results
karanchahal opened this issue · comments
Hey,
I was implementing 1 cycle policy as an exercise. And I have a few observations from my experiments.
I have a
Model : Resnet18.
Batch size for training = 128
Batch size for testing = 100
Optimser : optim.SGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
Total number of epochs 26
1 cycle policy : Learning rate goes from 0.01 to 0.1 and back till 24 epochs
Then model is trained for 2 epochs at 0.001 learning rate.
No cyclic momentum used or adamw.
I achieved a test set accuracy of 93.4%in 26 epochs.
This seems like a big difference from the 70 epochs at 512 batch size that is quoted in your blog post.
Am I doing something wrong ? Is the number of epochs a good metric to base your results on, as those are dependant on the batch size ? .
The whole point of using super convergence is using high learning rates to converge quicker , but it seems like using low learning rates (0.01- 0.1 < 0.8-3) is faster to train.
Sorry, I didn't this until now.
The blog post you're referring too is a bit old now, and it was when we were just grasping with super-convergence. Now we can train to 94% accuracy in 30 epochs (see here) with 1cycle and AdamW.