训练的代码用最新的timm跑不通
kevin-Abbring opened this issue · comments
Rog commented
following timm: set wd as 0 for bias and norm layers
param_groups = optim_factory.add_weight_decay(model_without_ddp, args.weight_decay)
optimizer = torch.optim.AdamW(param_groups, lr=args.lr, betas=(0.9, 0.95))
print(optimizer)
loss_scaler = NativeScaler()
其中add_weight_dacay函数没了,改为 optim_factory.param_groups_weight_decay即可