Question about the cosine annealing learning rate decay

Question

aFewThings opened this issue 3 years ago · comments

Hi, thanks for sharing your study.
While I was reading your paper, I wondered why you were using the cosine annealing scheduler.

Is there any special reason why you chose this scheduler?
Did you train all other compared models with this scheduler in COCO and MPII experiments?

I'm just asking because this scheduler is unfamiliar to me in human pose estimation domain.

senius · Answer 1 · Mon May 10 2021 12:03:57 GMT+0800 (China Standard Time)

Hi, @aFewThings, thanks for your interest.

I also tried the same training schedule of HRNet codebase with MultiStepLR decay to train TransPose. But the model performances seem to be sensitive to the initial learning rate and the milestones epochs, and some models even could not be trained well to work. I chose this schedule because, with the same initial learning rate, some models perform better than those with MultiStepLR decay. And all the models showed relatively good performances under this same schedule. But, note that this schedule may not be an optimal one, and you also can train with others like the training schedule of DETR.
I also used this schedule to train the SimpleBaseline-Res50 with Darkpose post-processing, it gains 72.1AP on COCO val set (+0.1 improvement). You can see it in Section 4.1 of the updated paper.

EunBeen Kim · Answer 2 · Mon May 10 2021 12:21:35 GMT+0800 (China Standard Time)

Thank you!