yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

resizing images for feature visualization

kahnchana opened this issue · comments

Hi, I'm really interested in this work, and was looking at the feature visualization section.

In this code, how do you feed larger size images to the model? (e.g. 512 to 384 VIT) Do you make any modifications?

Hi,

You can interpolate the position embedding for different image size with the function here.

Or directly use T2T-ViT as the way in the usage, we already put the interpolation function in the function of 'load_for_transfer_learning'.

Thanks a lot for the info. Got it working.