Small learning rate value

Question

Small learning rate value

davidnvq opened this issue 3 years ago · comments

❔Question

Thank you for your great work to examine transformers in OD. My question is that why do we start with a very small learning rate 2.5 * 10e-5 as there is no clue in your paper? My first guess is that you inherited the settings from the DETR framework.

Have you tried with larger learning rates? To speed up the training procedure with more GPUs, any rule to scale up the learning rate for YOLOS as you experimented without losing the performance?

Many thanks.

Yuxin Fang (方羽新) · Answer 1 · Thu Aug 12 2021 16:38:27 GMT+0800 (China Standard Time)

Hi @davidnvq, thanks for your interest in YOLOS.

We haven't got many chances to try. We found YOLOS with 5 * 10e-5 & 10 * 10e-5 lr can converge, but gives less competitive results.

For the lr scaling and large scale training, please refer to facebookresearch/detr#48 (comment) and hustvl/QueryInst#12 (comment).

Van-Quang Nguyen · Answer 2 · Thu Aug 12 2021 16:42:16 GMT+0800 (China Standard Time)

Thanks for your feedback. Let me close the issue.