Mismatched hyper-parameter settings for

Question

Mismatched hyper-parameter settings for

liming-ai opened this issue 3 years ago · comments

Ming Li commented 3 years ago

Hi @TonyLianLong,

I noticed that the hyper-parameters reported in your paper is same with LDAM:

weight decay is 2e-4
lr decay steps are 120 and 160

but in your config files, they are changed into:

weight decay is 5e-4
lr decay steps are 160, 180

I am quite confused, could you please explain it?

Long(Tony) Lian · Answer 1 · Mon Aug 02 2021 12:08:47 GMT+0800 (China Standard Time)

Hi,
For point 2, I think both LDAM and us are using 160 and 180 (not 120 and 160). We use the lr decay steps without change. This is LDAM source code: https://github.com/kaidic/LDAM-DRW/blob/master/cifar_train.py#L366

For point 1, I think weight decay is something that could be different with different network architectures. For example, LDAM changes from template's 1e-4 to 2e-4 (See https://github.com/kaidic/LDAM-DRW/blob/master/cifar_train.py#L54-L56), probably because of the optimization difference. Since we use different architecture and optimization, it makes sense to tune weight decay. I believe that the weight decay does not influence accuracy much especially for large datasets, in my experience.

I'm sorry for the confusion and please use the hyperparameters in our config as your final guide to hyperparam tuning.

Ming Li · Answer 2 · Mon Aug 02 2021 12:13:16 GMT+0800 (China Standard Time)

Hi,
For point 2, I think both LDAM and us are using 160 and 180 (not 120 and 160). We use the lr decay steps without change. This is LDAM source code: https://github.com/kaidic/LDAM-DRW/blob/master/cifar_train.py#L366

For point 1, I think weight decay is something that could be different with different network architectures. For example, LDAM changes from template's 1e-4 to 2e-4 (See https://github.com/kaidic/LDAM-DRW/blob/master/cifar_train.py#L54-L56), probably because of the optimization difference. Since we use different architecture and optimization, it makes sense to tune weight decay. I believe that the weight decay does not influence accuracy much especially for large datasets, in my experience.

I'm sorry for the confusion and please use the hyperparameters in our config as your final guide to hyperparam tuning.

Got it, thanks a lot!