A complete easy to follow implementation of Google's Vision Transformer proposed in "AN IMAGE IS WORTH 16X16 WORDS". This pytorch implementation has comments for better understanding.
Hi, Thank you for sharing this code base.
Can you share the hyper parameters for training over CIFAR10
Running the code as is, yields accuracy of around 75%.