lazyCodes7 / RedHenLab_Multimodal_Christian_Art_Tagging

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Christian Art Tagging

Tagging Christian Art pieces using Computer Vision!

Summary of Contents

  1. About The Project
  2. Pipeline
  3. Milestones
  4. Future Deliverables
  5. Getting Started
  6. Usage
  7. Blog
  8. Roadmap
  9. License
  10. Contact

About The Project

Christian Iconography refers to the study of identifying the saints in a painting by using the attributes such as a crucifix, pedestal or key. Art historians such as Emile Male have spent a significant amount of time describing this art. Due to their contributions, while most of this art is available digitally with provided captions, there still doesn’t exist a "ground-truth" dataset for the same. Hence the goal of this project is to curate a dataset for tagging this arena of art. As a second direction, this project proposes the use of a Multimodal Transformer to automate the process of tagging. While this would never replace the actual historian it still has possibilities in reducing the burden on actual professionals and helping out people who might not be well-versed with Christian Art.

Built With

Pipeline

Milestones

  • Curated a dataset of 9.6k images with provided captions, titles, and miscellanous metadata
  • Worked on a feature extractor module that can also detect the saints and their attributes in the painting
  • Finally, managed to deliver a ViT+GPT2 based Transformer that uses images + prompts to caption the Christian Icons in the painting

Future Deliverables

  • Improving the quality of curated captions in order to build a more robust model
  • Improving the performance of the multimodal transformer for describing intricate details in the painting.
  • Working on a module that can let people add more data points to the dataset that I have curated.

Getting Started

Prerequisites

  • Python versions 3.6-3.10

Setup

  1. Clone the repository from github
git clone https://github.com/lazyCodes7/RedHenLab_Multimodal_Christian_Art_Tagging.git
  1. Go the directory where the repository is cloned and activate a virtual environment as follows
cd RedHenLab_Multimodal_Christian_Art_Tagging
virtualenv cenv
source cenv/bin/activate
  1. Install the requirements(Python)
pip install -r requirements.txt
  1. Install Docker and Singularity

Usage

1. Running the Emile Male Pipeline on Case HPC

Intro to Case HPC

Computationally intensive research is supported through the operation of continuously growing high performance computing cluster based on Dell PowerEdge servers with Intel processors and Red Hat Enterprise Linux. Details of the cluster configuration can be found at the link below.

Know more here

Logging in to Case HPC + Locating to Home directory + Required files

Folllow these steps and replace rpm93 with your username

ssh -vv rpm93@rider.case.edu
# After login
cd /mnt/rds/redhen/gallina/home/rpm93
# After reaching home
git clone https://github.com/lazyCodes7/RedHenLab_Multimodal_Christian_Art_Tagging.git

Requesting GPU on Case HPC

Case Western provides access to certain gpu nodes for processing. If you want to access one. Try running si first to see which ones are available and run the following command to access the gpu

srun -p gpu -C gpu2v100 --nodelist=gpu059t --gres=gpu:1 --mem=50gb --pty bash

View more on this in my blog

Syncing files on Case HPC

Now you are using a GPU node hopefully. But all files have disappeared! So we need to sync the files over here. Here is an example on how to do this if my username is rpm93.

rsync -az hpc3:/mnt/rds/redhen/gallina/home/rpm93/RedHenLab_Multimodal_Christian_Art_Tagging/ /tmp/$USER/

Navigate to the Pipeline + Running the script

Once the files are synced. Next step is to locate the script for running the pipeline. The example is for user rpm93

cd rpm93/pipeline
bash run_pipeline.sh

Interpreting the script

#!/bin/bash
echo "-----------------------------------------------------"
echo "Emile Male Pipeline: Training"
rsync -az hpc3:/mnt/rds/redhen/gallina/home/rpm93/RedHenLab_Multimodal_Christian_Art_Tagging/ /tmp/$USER/
echo "Files synced successfully"

echo "Creating singularity enviroment...."
module load singularity/3.8.1
singularity pull docker://ghcr.io/lazycodes7/christian-art-tagging:latest
mkdir curation/EmileMaleDataset
echo "Enviroment successfully created."

echo "Stage: 1 - Generating the Curated Dataset"
singularity run --nv christian-art-tagging_latest.sif python curation/generator.py --metadata_path curation/metadata_v2.csv --data_dir curation/EmileMaleDataset/

echo "Stage: 2- Training the feature-extractor to extract patch level features"
singularity run --nv christian-art-tagging_latest.sif python feature_extractor/train.py -c --train --device cuda --data_dir feature_extractor/

echo "Stage: 3- Training the Image-Captioning model that uses intra-modal features"
singularity run --nv christian-art-tagging_latest.sif python captioning/train.py --data_dir curation/EmileMaleDataset/ --feature_extractor_path feature_extractor/artDL.pt --device cuda --train

As we can see in the script the pipeline is an amalgamation of three individual modules. The first module is curation which as the name suggests downloads the dataset I have collected. The second step is to train a feature extractor that can extract features from the images. The current version of the pipeline uses a Vision Transformer instead of using FRCNN(Why this though). Finally we have the captioning model aka a transformer that leverages both the image and text to generate captions.

2. Running the modules from the pipeline locally.

In order to run the modules from the pipeline follow the steps used in setup and use cd pipeline to navigate to the pipeline

Singularity setup

While we do not need singularity to run the modules locally. Here is how to setup the container for local use.

singularity pull docker://ghcr.io/lazycodes7/christian-art-tagging:latest

Running the curation module locally.

Help
    -p/--metadata_path - The path to metadata file
    -d/--data_dir - The path to download the dataset to
    -f/--force - Force download an image even if it is already downloaded
Example
# using singularity
singularity run --nv christian-art-tagging_latest.sif python curation/generator.py --metadata_path curation/metadata_v2.csv --data_dir curation/EmileMaleDataset/

# just running
python curation/generator.py --metadata_path curation/metadata_v2.csv --data_dir curation/EmileMaleDataset/

Outputs

9.6K Data points from different museums consisting of different art styles

Running the feature_extraction module locally.

Help
    -b, --train_batch_size - Batch size for training
    --data_dir - Directory to use images for training
    --device - Device to train/test on
    --train - Flag to train/test
    -p,--model_path - Trained model path to be used during testing
    -c, --collect - Collect the ArtDL dataset if not already present

Example
# using singularity
singularity run --nv christian-art-tagging_latest.sif python feature_extractor/train.py -c --train --device cuda --data_dir feature_extractor/

# just running
python feature_extractor/train.py -c --train --device cuda --data_dir feature_extractor/
Outputs

Weights for feature-extraction model that are to be used in the captioning module

Running the captioning module locally.

Help
    --metadata_path - The path to metadata file
    --data_dir - Directory to use images for training
    --feature_extractor_path - Model path from stage-2 feature extractor
    -e, --epochs - No of epochs to train the model for
    -lr, --learning_rate - Learning rate for the model
    -d, --device - Device to train/test on
    --train - Flag to train/test
    --train_batch_size - Batch size for training
    --captioning_model_path - Saved path of the captioning model(Useful during inference)

Example
# using singularity
singularity run --nv christian-art-tagging_latest.sif python captioning/train.py --data_dir curation/EmileMaleDataset/ --feature_extractor_path feature_extractor/artDL.pt --device cuda --train

# just running
python captioning/train.py --data_dir curation/EmileMaleDataset/ --feature_extractor_path feature_extractor/artDL.pt --device cuda --train

Outputs

Captions based on the prompts during inference and associated captioning metrics during training+testing. See the results here

Blog

Know more details about how this project has progressed here

Roadmap

See the open issues for a list of proposed features (and known issues).

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Rishab Mudliar - @cheesetaco19 - rishabmudliar@gmail.com

Telegram: lazyCodes7

About

License:MIT License


Languages

Language:Jupyter Notebook 98.2%Language:Python 1.8%Language:Shell 0.0%Language:Dockerfile 0.0%