frank-xwang / RIDE-LongTailRecognition

[ICLR 2021 Spotlight] Code release for "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts."

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mismatched hyper-parameter settings for

liming-ai opened this issue · comments

Hi @TonyLianLong,

I noticed that the hyper-parameters reported in your paper is same with LDAM:

  1. weight decay is 2e-4
  2. lr decay steps are 120 and 160

but in your config files, they are changed into:

  1. weight decay is 5e-4
  2. lr decay steps are 160, 180

I am quite confused, could you please explain it?

Hi,
For point 2, I think both LDAM and us are using 160 and 180 (not 120 and 160). We use the lr decay steps without change. This is LDAM source code: https://github.com/kaidic/LDAM-DRW/blob/master/cifar_train.py#L366

For point 1, I think weight decay is something that could be different with different network architectures. For example, LDAM changes from template's 1e-4 to 2e-4 (See https://github.com/kaidic/LDAM-DRW/blob/master/cifar_train.py#L54-L56), probably because of the optimization difference. Since we use different architecture and optimization, it makes sense to tune weight decay. I believe that the weight decay does not influence accuracy much especially for large datasets, in my experience.

I'm sorry for the confusion and please use the hyperparameters in our config as your final guide to hyperparam tuning.

Hi,
For point 2, I think both LDAM and us are using 160 and 180 (not 120 and 160). We use the lr decay steps without change. This is LDAM source code: https://github.com/kaidic/LDAM-DRW/blob/master/cifar_train.py#L366

For point 1, I think weight decay is something that could be different with different network architectures. For example, LDAM changes from template's 1e-4 to 2e-4 (See https://github.com/kaidic/LDAM-DRW/blob/master/cifar_train.py#L54-L56), probably because of the optimization difference. Since we use different architecture and optimization, it makes sense to tune weight decay. I believe that the weight decay does not influence accuracy much especially for large datasets, in my experience.

I'm sorry for the confusion and please use the hyperparameters in our config as your final guide to hyperparam tuning.

Got it, thanks a lot!