What is the reason for using cosine annealing scheduler for stage1 if min_lr = optimizer lr

Question

What is the reason for using cosine annealing scheduler for stage1 if min_lr = optimizer lr

fostiropoulos opened this issue 2 years ago · comments

Iordanis Fostiropoulos commented 2 years ago

It is not clear if this is a bug with the implementation or intentional that the cosine annealing schedule will never kick in because the lr of the optimizer is equal to the lr of the minimum lr of the scheduler. Please advice.

Doyup Lee · Answer 1 · Wed Sep 07 2022 01:52:42 GMT+0800 (China Standard Time)

@fostiropoulos ,
we implement the cosine annealing to explore optimal training setting of RQ-VAE, including cosine learning rate schedule.
While cosine learning rate scheduling requires the predefinition of total training epoch, we found that the performance of RQ-VAE can be improved as the training proceeds. Thus, instead of fixing the implementation of cosine lr schedule, we fix the minimum lr to instantiate the constant lr schedule with warm-up.