关于训练设置的一些问题

Question

关于训练设置的一些问题

nullxjx opened this issue 2 years ago · comments

作者你好，最近我基于你的代码改了一些版本，发现一开始效果比你的好，但是后面就无法继续上升了，我觉得可能是因为训练设置的问题，毕竟很多人都说vision transformer对训练策略很敏感。

Swin Transformer用的优化器是adamw，为什么这里用的是adam？Swin Transformer LR_SCHEDULER用的是cosine，为什么这里改成了MultiStepLR？还有这里的milestones是你们自己调出来的吗，设置为[250000,400000,450000,475000,500000]有什么原因吗？还有为什么不用G_optimizer_clipgrad？G_optimizer_wd为什么设为0？

另外一个问题就是，我直接用你的代码没法复现出你论文中在几个数据集的结果，不知道你们的训练集DIV2K用的是800张图片还是900张图片。

希望看到能回答一下，谢谢~

Jingyun Liang · Answer 1 · Fri Dec 17 2021 22:54:10 GMT+0800 (China Standard Time)

In my experiments,
1, Adam is slightly better than Adam
2, MultiStepLR is slightly better than cosine
3, Milestones are chosen by design. We reduce the iteration by half every time when the lr is reduced.
4, G_optimizer_clipgrad is not tested, but I think it has no much impact.
5, G_optimizer_wd is not tested.
6, I use 800 DIV2K images. How large is your gap in your training?

XJX · Answer 2 · Sat Dec 18 2021 14:21:28 GMT+0800 (China Standard Time)

Set5 Set14 B100结果跟你们一样，Urban100我们复现过两次，结果分别是33.34、33.24，你的论文中是33.40；Manga109两次结果分别是39.51、39.49，你的论文中是39.60。

Jingyun Liang · Answer 3 · Sat Dec 18 2021 19:58:37 GMT+0800 (China Standard Time)

I have no idea why your model performs worse on Urban100 and Manga109. You can also refer to here for a third-party unofficial implementation.

KeatsHao · Answer 4 · Fri Mar 25 2022 11:13:56 GMT+0800 (China Standard Time)

您好，请问你们实验的学习率初始设置是多少？我也改了一个版本，发现收敛速度很慢

Jingyun Liang · Answer 5 · Wed Jun 08 2022 16:35:55 GMT+0800 (China Standard Time)

See https://github.com/cszn/KAIR/blob/master/docs/README_SwinIR.md for training settings.