LearningRateFinder during training
X-Bruce-Y opened this issue · comments
Description & Motivation
LearningRateFinder callback is cool to find a learning rate that is potential to reduce training loss from the start. However, in my case since epoch 2, learning rates need to decreased a lot promptly, otherwise the loss would not drop any more. That makes me wonder why the technique is not extended to the whole process of training. Though that would take longer (presumbaly < 2x) for each epoch, the overall efficacy should be expected to improve.
Pitch
Extend the LearningRateFinder callback so that it can be called during the whole training process, with user-defined interval (probably using epoch as unit), start_epoch, end_epoch, etc.
Alternatives
Currently it's possible to train one epoch, save the checkpoint, re-initiate the model with the latest epoch, use the callback to find optimal learning rate, and do it all over again. However, it's so indirect and slows training down too much.
Additional context
No response
cc @Borda