Fine-tuning ViT model google/vit-base-patch16-224-in21k on a dataset containing indoor scenes.
- Data comes from kaggle: https://www.kaggle.com/itsahmad/indoor-scenes-cvpr-2019
- Got inspiration for the code from: https://github.com/NielsRogge/Transformers-Tutorials/tree/master/VisionTransformer
- look at image-classification-finetuning.ipynb to see how we finetune our model.