Training details

Question

Training details

ZichaoGuo opened this issue 4 years ago · comments

ZichaoGuo commented 4 years ago

Could you release the training code? Or introduce the training details? Many details are missed in paper.

Ming Lin · Answer 1 · Fri Jul 31 2020 13:36:31 GMT+0800 (China Standard Time)

There are sufficient details in our paper.

ZichaoGuo · Answer 2 · Fri Jul 31 2020 14:10:57 GMT+0800 (China Standard Time)

What is your learning rate schedule, I didn't see it in paper.
“The final networks are trained up to 480 epochs with label-smoothing [Szegedy et al., 2016], mix-up [Zhang et al., 2018], random-erase [Zhong et al., 2020] and auto-augmentation [Cubuk et al., 2019]. Due to the space limitation, more details and results could be found in appendix.”
I didn't see the training detail in appendix. Could you tell more here.

pawopawo · Answer 3 · Sun Aug 09 2020 21:43:17 GMT+0800 (China Standard Time)

看到您的GENet，很感兴趣，想复现一下论文的结果，但是发现论文的训练细节不是特别清楚。我用batch size 1024，lr 0.5，weight decay 1e-4，epochs 360， 5个epochs的 warmup，cosine 学习率衰减，无dropout， GENet-normal结构的精度只训练到了76.1。

想咨询一下GENet-normal结构的训练策略是怎么样的，比如 lr，batch size，weight decay ，dropout rate，epochs，学习率的衰减策略，以及是否用了warm up。盼望得到您的帮助～

pawopawo · Answer 4 · Mon Aug 10 2020 22:50:20 GMT+0800 (China Standard Time)

What is your learning rate schedule, I didn't see it in paper.
“The final networks are trained up to 480 epochs with label-smoothing [Szegedy et al., 2016], mix-up [Zhang et al., 2018], random-erase [Zhong et al., 2020] and auto-augmentation [Cubuk et al., 2019]. Due to the space limitation, more details and results could be found in appendix.”
I didn't see the training detail in appendix. Could you tell more here.

请问您复现出GENet结果了么？调整了好几个训练策略，跟论文的差距还是很大。如果您也没复现出来，我就放弃了～

ZichaoGuo · Answer 5 · Tue Aug 11 2020 10:59:48 GMT+0800 (China Standard Time)

@pawopawo 我也没复现出来，我复现的结果跟你的差不多

Ming Lin · Answer 6 · Thu Aug 13 2020 01:54:20 GMT+0800 (China Standard Time)

We will update our draft this week to include more detailed training parameters. We use cosine lr decay, warm-up 5 epochs, wd is 4e-5, lr=0.1, batch size 256.