yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hard to train

huixiancheng opened this issue · comments

Hi.Dear @yuanli2333 I try to use t2t-vit for downstream sem.seg tasks.
However ,as we know Vit backbone it's very hard to train. The default settings of train epochs in ImageNet is 300.
I have try two different network structure with t2t-vit 14.
The 1st train with SGD optimizer and cosine-warmup.After 120 epochs, the loss curves as follow
QQ截图20210327144126
The 2nd train with Adam optimizer and cosine-warmup.(not use timm.create_optimizer to set adamw sice i need to set different lr for different blocks.) The set of lr is similar to your setting.After 40 epochs, the loss curves also as follow.
QQ截图20210327143246
It's look like that the 2nd training much better and the loss is still in decrease.But I'm not sure is it on the right path.(according to my calculation, it will take 6 days to train 300 epochs with a single 3090 GPU, so I don't have time to trial & error:sob::sob::sob:)
Could you show me your training log as a reference or give me some advice? Thank you very much.

Is it must to use adamw as optimizer?If it is, I will use it.

Hi, I have uploaded the log of T2T-ViT-14 in here. You can compare it with your training.

I think your loss curve is normal.
Empirically, we use Adam or AdamW for visual transformer, SGD can work but seems not better than Adam/W.

Thank you very much!:bow: