yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ImageNet100 very low accuracy

Evgeneus opened this issue · comments

Dear authors,

I would like to run T2T on ImageNet100 on 2 gpus. But I have gotten just 8.5 in top-1 accuracy after 200 epochs! Also the train loss is high. Do you know what can be a reason for that?

  • I changed the number of classes in the train file (to match 100 classes)
  • running script:
    OMP_NUM_THREADS=16 CUDA_VISIBLE_DEVICES=0,1 bash distributed_train.sh 2 /data/datasets/imagenet-100/ --model T2t_vit_14 -b 128 --lr 1e-3 --weight-decay .03 --cutmix 0.0 --reprob 0.25 --img-size 224
  • some outputs:
    epoch,train_loss,eval_loss,eval_top1,eval_top5 194,4.363854191519997,4.067602333831787,8.519999993896484,26.46000007324219 195,4.340610720894554,4.064138192749024,8.59999998779297,26.379999963378907

Hi,

We also trained our T2T-ViT on other datasets like CIFAR100 from scratch, and got reasonable results (77%-80%). So I am not sure why your training not work on ImageNet100 without enough information.

You can also borrow some training method from our transfer learning or other implementations like this one, which only train 60 epoches but still achieve accuracy > 70%.