faustomorales / vit-keras

Keras implementation of ViT (Vision Transformer)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

different image size in fine-tuning

captainst opened this issue · comments

Hi there,

I saw the implementation using a convolution to generate fixed size hidden vector from a variable size of input image. That's brilliant!
However, I am wondering if the fine-tuning result would be degradated, using a different input image size, say, 224, rather than the official input size, 384, as shown in your example.

Many thanks !

I will wait for a response 😄

Based on my experience, I did not see any downgrade performance at 224 resolution compared with 384

Closing as this is a question about modeling generally, suitable for research discussion in the original research repository, and not a problem with the code in this repository.