Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

Home Page:https://lightning.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LearningRateFinder during training

X-Bruce-Y opened this issue · comments

Description & Motivation

LearningRateFinder callback is cool to find a learning rate that is potential to reduce training loss from the start. However, in my case since epoch 2, learning rates need to decreased a lot promptly, otherwise the loss would not drop any more. That makes me wonder why the technique is not extended to the whole process of training. Though that would take longer (presumbaly < 2x) for each epoch, the overall efficacy should be expected to improve.

Pitch

Extend the LearningRateFinder callback so that it can be called during the whole training process, with user-defined interval (probably using epoch as unit), start_epoch, end_epoch, etc.

Alternatives

Currently it's possible to train one epoch, save the checkpoint, re-initiate the model with the latest epoch, use the callback to find optimal learning rate, and do it all over again. However, it's so indirect and slows training down too much.

Additional context

No response

cc @Borda