yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Which hyperparameters should I change if I have different input size?

chenwydj opened this issue · comments

I assume current architecture-related hyper-parameters (e.g. kernel_size of first several soft_split layers) are designed for 224x224 imagenet images.

Which hyperparameters should I change if I have different input size, say 64x64 imagenet images?

Thank you very much!

We have tried to train our mode on the size of 384x384, the hyperparameters in our training scripts can achieve good results, like T2T-ViT-14 can achieve 83.3% top1 accuracy. So for 64x64, I guess you can try our hyperparamters first.

Thank you very much!