[EVA02] LearningRateDecayOptimizerConstructor 'decay_type': 'vit_wise'
billbliss3 opened this issue · comments
It seems no 'vit_wise' type in LearningRateDecayOptimizerConstructor functions.
@billbliss3 We didn't release the code for LearningRateDecayOptimizerConstructor. You should implement yourself.
@exiawsh
Hi, I am confuse about the parameter 'weight_decay=1e-7' in vit_wise mode.
Why the 2 version of param have a large diff: weight_decay 1e-2 to 1e-7
optimizer = dict(
type='AdamW',
lr=4e-4, # bs 8: 2e-4 || bs 16: 4e-4
paramwise_cfg=dict(
custom_keys={
'img_backbone': dict(lr_mult=0.1),
}),
weight_decay=0.01)
optimizer = dict(constructor='LearningRateDecayOptimizerConstructor',
type='AdamW',
lr=1e-4, betas=(0.9, 0.999), weight_decay=1e-7,
paramwise_cfg={'decay_rate': 0.9,
'head_decay_rate': 4.0,
'decay_type': 'vit_wise',
'num_layers': 24,
})
@billbliss3 They have similar results. weght decay is not very important.
Hi, could you explain the meaning param head_decay_rate, I am a little confused.
@exiawsh
'head_decay_rate': 4.0
@billbliss3 The learning rate of the detection head (4e-4) is 4x base_learning rate (1e-4).