sgugger / Deep-Learning

A few notebooks about deep learning in pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

1cycle Policy. Unfamiliar results

karanchahal opened this issue · comments

Hey,

I was implementing 1 cycle policy as an exercise. And I have a few observations from my experiments.
I have a
Model : Resnet18.
Batch size for training = 128
Batch size for testing = 100

Optimser : optim.SGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
Total number of epochs 26

1 cycle policy : Learning rate goes from 0.01 to 0.1 and back till 24 epochs

Then model is trained for 2 epochs at 0.001 learning rate.

No cyclic momentum used or adamw.

I achieved a test set accuracy of 93.4%in 26 epochs.

This seems like a big difference from the 70 epochs at 512 batch size that is quoted in your blog post.

Am I doing something wrong ? Is the number of epochs a good metric to base your results on, as those are dependant on the batch size ? .

The whole point of using super convergence is using high learning rates to converge quicker , but it seems like using low learning rates (0.01- 0.1 < 0.8-3) is faster to train.

Sorry, I didn't this until now.
The blog post you're referring too is a bit old now, and it was when we were just grasping with super-convergence. Now we can train to 94% accuracy in 30 epochs (see here) with 1cycle and AdamW.