Hard to train

Question

Hard to train

huixiancheng opened this issue 3 years ago · comments

Hi.Dear @yuanli2333 I try to use t2t-vit for downstream sem.seg tasks.
However ,as we know Vit backbone it's very hard to train. The default settings of train epochs in ImageNet is 300.
I have try two different network structure with t2t-vit 14.
The 1st train with SGD optimizer and cosine-warmup.After 120 epochs, the loss curves as follow

The 2nd train with Adam optimizer and cosine-warmup.（not use timm.create_optimizer to set adamw sice i need to set different lr for different blocks.） The set of lr is similar to your setting.After 40 epochs, the loss curves also as follow.

It's look like that the 2nd training much better and the loss is still in decrease.But I'm not sure is it on the right path.(according to my calculation, it will take 6 days to train 300 epochs with a single 3090 GPU, so I don't have time to trial & error:sob::sob::sob:)
Could you show me your training log as a reference or give me some advice? Thank you very much.

Huixian Cheng · Answer 1 · Sat Mar 27 2021 15:35:29 GMT+0800 (China Standard Time)

Is it must to use adamw as optimizer？If it is, I will use it.

YuanLi · Answer 2 · Mon Mar 29 2021 21:31:33 GMT+0800 (China Standard Time)

Hi, I have uploaded the log of T2T-ViT-14 in here. You can compare it with your training.

I think your loss curve is normal.
Empirically, we use Adam or AdamW for visual transformer, SGD can work but seems not better than Adam/W.

Huixian Cheng · Answer 3 · Tue Mar 30 2021 10:57:52 GMT+0800 (China Standard Time)

Thank you very much!:bow: