tahmid0007 / VisionTransformer

A complete easy to follow implementation of Google's Vision Transformer proposed in "AN IMAGE IS WORTH 16X16 WORDS". This pytorch implementation has comments for better understanding.

Performances

ssram50 opened this issue 3 years ago · comments

ssram50 commented 3 years ago

Hi, Thank you for sharing this code base.
Can you share the hyper parameters for training over CIFAR10
Running the code as is, yields accuracy of around 75%.