Experiments for Image Captioning using slot attention:

This is my final year thesis project for improving performance on image captioning.

Current architecture:

Create an environment:

conda create --name imgcap

Activate the environment:

conda activate imgcap

Install the dependancies:

pip install -r requirements.txt

Create a directory: ~/datasets/ using:

mkdir ~/datasets/

Download the dataset:

wget -P ~/datasets/gqa_imgs https://www.kaggle.com/datasets/adityajn105/flickr8k/download?datasetVersionNumber=1

*Extract the dataset: *

unzip flickr8k.zip

Downloading GQA Images:

Create a subdirectory: ~/datasets/gqa_imgs using:

mkdir ~/datasets/gqa_imgs

Download GQA imgs:

wget -P ~/datasets/gqa_imgs https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip

Extract the images folder:

unzip images.zip

Downloading GQA Annotations:

Create a subdirectory: ~/datasets/gqa_ann using:

mkdir ~/datasets/gqa_ann

Download GQA annotations:

wget -P ~/datasets/gqa_imgs https://zenodo.org/record/4729015/files/mdetr_annotations.tar.gz?download=1

Rename and extract annotations folder:

mv 'mdetr_annotations.tar.gz?download=1' ann.tar.gz
tar -xvzf ann.tar.gz

Log into wandb

wandb login

(This will prompt a link to the wandb auth key. Copy and paste it in the terminal.)

Pre-training on GQA dataset:

python train.py --epochs=35

Training on PhraseCut dataset:

python train_obj.py --epochs=10 --load=1

Fine-tuning on GQA dataset:

python train.py --epochs=25 --load=1

Get back on the home folder

cd slotvqa

Schedule the job:

srun test.job.sbatch

Final Year Thesis Project on Image Captioning

Language:Python 98.6%Language:Jupyter Notebook 1.2%Language:Shell 0.2%