This repository includes the reference code for paper:
Dynamic Early Exit for Efficient Image Captioning
To run the code, annotations and images for the COCO dataset are needed. Please download the zip files including the images (train2014.zip, val2014.zip), the zip file containing the annotations (annotations_trainval2014.zip) and extract them. These paths will be set as arguments later. Our code supports the image features extracted from conventional Faster-RCNN or CLIP model.
Run python train_deecap.py
using the following arguments:
Argument | Possible values |
---|---|
--exp_name |
Experiment name (default: deecap) |
--train_data_path |
Path to the training dataset |
--features_path |
Path to detection features file (optional) |
--annotation_folder |
Path to folder with annotations (optional) |
--tokenizer_path |
Path to the tokenizer |
--out_dir |
Path to the saved checkpoint |
--batch_size |
Batch size (default: 10) |
--lr |
Learning rate (default: 1e-4) |
To reproduce the results reported in our paper, download the checkpoint model file and place it in the ckpt folder.
Run python test.py
using the following arguments:
Argument | Possible values |
---|---|
--batch_size |
Batch size (default: 10) |
--features_path |
Path to detection features file |
--annotation_folder |
Path to folder with COCO annotations |
This repository refers to Transformer Image Captioning and huggingface DeeBERT.