hustvl / YOLOS

[NeurIPS 2021] You Only Look at One Sequence

Home Page:https://arxiv.org/abs/2106.00666

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Small learning rate value

davidnvq opened this issue · comments

❔Question

Thank you for your great work to examine transformers in OD. My question is that why do we start with a very small learning rate 2.5 * 10e-5 as there is no clue in your paper? My first guess is that you inherited the settings from the DETR framework.

Have you tried with larger learning rates? To speed up the training procedure with more GPUs, any rule to scale up the learning rate for YOLOS as you experimented without losing the performance?

Many thanks.

Hi @davidnvq, thanks for your interest in YOLOS.

We haven't got many chances to try. We found YOLOS with 5 * 10e-5 & 10 * 10e-5 lr can converge, but gives less competitive results.

For the lr scaling and large scale training, please refer to facebookresearch/detr#48 (comment) and hustvl/QueryInst#12 (comment).

Thanks for your feedback. Let me close the issue.