AlaGrine / Artwork_classification_in_PyTorch

Image classification in PyTorch using convolutional and transformer based models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Artwork classification with PyTorch

Image classification in PyTorch using Convolutional and Transformer models

Table of Contents

  1. Project Overview
  2. Installation
  3. File Descriptions
  4. Modelling
  5. Results
  6. Deploy a Gradio demo to HuggingFace Spaces
  7. Blog post
  8. Acknowledgements

Project Overview

The aim of this project is to classify artworks using the artbench dataset. The dataset includes 60,000 images of artworks from 10 different artistic styles, including paintings, murals, and sculptures from the 14th to the 21st century. Each style has 5,000 training images and 1,000 test images.

I used PyTorch to create convolutional and transformer-based models. Specifically, I leveraged and fine-tunde the pre-trained EfficientNet_B2-B2 and ViT_B16 models.

I replicated the ViT paper and built the ViT model from scratch to enhance comprehension of the Transformer architecture.

To speed up model training, I used two free Kaggle GPU T4 accelerators.

I also created a Gradio demo and deployed the app to HuggingFace Spaces.

You can find the app here.

Installation

This project requires Python 3 and the following Python libraries installed:

torch ,torchvision, torchinfo, torchmetrics, mlxtend, pandas, numpy, sklearn, matplotlib, wget, tarfile, gradio

File Descriptions

The main file of the project is artwork_classification.ipynb.

The project folder also contains the following:

  • artbench-10-imagefolder-split folder: Download the artbench dataset into this folder.
  • results folder: Includes model metrics and figures.
  • models folder: Includes fine-tuned EfficientNet_B2 model (.pth file).
  • gardio_demo folder: Includes the Gradio demo application.

Modelling

I built the following models:

  1. TinyVGG model as described in the CNN Explainer website.
  2. Convolutional-based model: EfficientNet_B2 feature extraction on 20% of the data with data augmentation
  3. EfficientNet_B2: Fine-tuning on the full dataset.
  4. Transformer-based model: ViT_B16 feature extraction on 20% of the data with data augmentation
  5. ViT_B16: Fine-tuning on the full dataset.

Results

The EfficientNet_B2 model outperforms the ViT_B16 model in all performance metrics. It achieves the highest accuracy, lowest loss, smallest size, and shortest prediction time per image.

Deploy a Gradio demo to HuggingFace Spaces

To deploy the Gradio demo to HuggingFace Spaces, follow these steps:

  1. Create a new space (ie. code repository). Space name = [SPACE_NAME].
  2. Select Gradio as the Space SDK and CPU basic (free) as Space hardware.

Then, follow the standard git workflow:

  1. Clone the repo locally: git clone https://huggingface.co/spaces/[USERNAME]/[SPACE_NAME]

  2. Copy the contents of gradio_demo folder to the clonded repo folder.

  3. Passwords are no longer accepted as a way to authenticate command-line Git operations. You need to use a personal access token as explained here.

        `git remote set-url origin https://[USERNAME]:[TOKEN]@huggingface.co/spaces/[USERNAME]/[SPACE_NAME]`
    
  4. git add .

  5. git commit -m "first commit"

  6. git push

Blog post

I wrote a blog post about this project. You can find it here.

Acknowledgements

Credit must be given to the authors of the artbench dataset.

About

Image classification in PyTorch using convolutional and transformer based models.


Languages

Language:Jupyter Notebook 99.7%Language:Python 0.3%