This repo contains the original code used for the experiments of RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning paper, and can be used to replicate the results.
This work was accepted on ICML 2020 LifelongML Workshop and has been recently accepted on NeurIPS 2020.
Links:
Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent models applied to problems like image captioning. In this paper we take a systematic look at continual learning of LSTM-based models for image captioning. We propose an attention-based approach that explicitly accommodates the transient nature of vocabularies in continual image captioning tasks -- i.e. that task vocabularies are not disjoint. We call our method Recurrent Attention to Transient Tasks (RATT), and also show how to adapt continual learning approaches based on weight regularization and knowledge distillation to recurrent continual learning problems. We apply our approaches to incremental image captioning problem on two new continual learning benchmarks we define using the MS-COCO and Flickr30 datasets. Our results demonstrate that RATT is able to sequentially learn five captioning tasks while incurring no forgetting of previously learned ones.
Results and comparisons on MS-COCO:
Results and comparisons on Flickr30k:
Numbers are the per-task performance after training on the last task. Per-task forgetting in the last row is the BLEU-4 performance after the last task divided by the BLEU-4 performance measured immediately after learning each task.
This is the list of python requirements:
python==3.8.2
torch==1.4.0
torchvision==0.5.0
numpy==1.18.1
pandas==1.0.3
Pillow==7.2.0
h5py==2.10.0
matplotlib==3.1.3
seaborn==0.10.1
bidict==0.19.0
dacite==1.5.0
nltk==3.4.5
pycocotools==2.0.0
tqdm==4.43.0
attrs==19.3.0
attr==0.3.1
rouge_score==0.0.3
nlg_eval==2.3
dataclasses==0.7
On a common linux distribution like Ubuntu, we can create a working environment for this project following this procedure:
- Create the conda environment with the provided
environment.yml
file:conda env create -f environment.yml
- Activate ratt environment:
conda activate ratt
- Install the remaining packages manually:
conda install pytorch==1.4.0 pip install dacite==1.5.0 pip install rouge_score==0.0.3
- Install nlg-eval following the instructions described
here, i.e.:
pip install git+https://github.com/Maluuba/nlg-eval.git@master conda install click nlg-eval --setup
The installation process of nlg-eval is not straightforward, we advice to visit the original repo page here: https://github.com/Maluuba/nlg-eval
If you like to use use pycharm on a small laptop and run experiments on a remote interpreter located in a powerful server (like I use to do), you should read this section to correctly set up nlg-eval for the remote interpreter:
- I assume you are using a conda environment with name TRUE_ENV_NAME in the remote server.
- Install all the required packages for the current project.
- Create a new fake environment with name FAKE_ENV_NAME in your remote machine with a script file python.sh with permission 755+x with the following content:
#!/bin/bash -l
/CONDAPATH/envs/TRUE_ENV_NAME/bin/python "$@"
- Set the pycharm remote interpreter to:
/CONDAPATH/envs/FAKE_ENV_NAME/bin/python.sh
- Now bash variables will be initialized on pycharm before each execution (and so java can be executed).
We report here the command to be executed in order to replicate the paper experiments.
All the training experiments are executed with a fixed seed (42).
We --gpu 0
parameter can be changed to use a different GPU index.
In order to train the models we have to pre-process the datasets. We will use ResNet-152 pre-trained on ImageNet to extract features from all the images of the original dataset. We will process each image only after resizing so that it will have 256 pixel on the shorter dimension, and after center-cropping a patch with size 224x224.
To pre-process MS-COCO:
- Change the path to the MS-COCO dataset in the first line of
coco_settings.py
- Run the script
coco_feats.py
, you can use custom parameters if you like, e.g.--gpu
,--workers
or-bs
.
To pre-process Flickr30k:
- Change the path to the Flickr30k dataset in the first line of
flickr30k_settings.py
- Run the script
flickr30k_feats.py
.
To train the models on MS-COCO with the proposed TASFI split:
python train.py --gpu 0 --seed 42 -j coco-TASFI -ee 10 -mdl 26 -f coco_ft
python train.py --gpu 0 --seed 42 -j coco-TASFI -ee 10 -mdl 26 -f coco_ewc -a ewc --ewc-sampling multinomial --ewc-lambda 10
python train.py --gpu 0 --seed 42 -j coco-TASFI -ee 10 -mdl 26 -f coco_lwf -a lwf --lwf-T 1
python train.py --gpu 0 --seed 42 -j coco-TASFI -ee 10 -mdl 26 -f coco_ratt -a ratt --ratt-usage 60 --ratt-smax 400
To evaluate on MS-COCO-TASFI test-set:
python eval.py --gpu 0 --seed 42 -j coco-TASFI --test -f coco_ft
python eval.py --gpu 0 --seed 42 -j coco-TASFI --test -f coco_ewc
python eval.py --gpu 0 --seed 42 -j coco-TASFI --test -f coco_lwf
python eval.py --gpu 0 --seed 42 -j coco-TASFI --test -f coco_ratt
To train the models on Flickr30K with the proposed SAVI split:
python train.py --gpu 0 --seed 42 -j flickr30k-SAVI -lr 1e-4 -bs 32 --nb-epochs 50 --extra-epochs 20 -mdl 40 -f flickr_ft
python train.py --gpu 0 --seed 42 -j flickr30k-SAVI -lr 1e-4 -bs 32 --nb-epochs 50 --extra-epochs 20 -mdl 40 -f flickr_ewc -a ewc --ewc-sampling multinomial --ewc-lambda 20
python train.py --gpu 0 --seed 42 -j flickr30k-SAVI -lr 1e-4 -bs 32 --nb-epochs 50 --extra-epochs 20 -mdl 40 -f flickr_lwf -a lwf --lwf-T 1
python train.py --gpu 0 --seed 42 -j flickr30k-SAVI -lr 1e-4 -bs 32 --nb-epochs 50 --extra-epochs 20 -mdl 40 -f flickr_ratt -a ratt --ratt-usage 60 --ratt-smax 400
Finally, to evaluate the trained models on Flickr30K-SAVI test set:
python eval.py -j flickr30k-SAVI --gpu 0 -f flickr_ft --test
python eval.py -j flickr30k-SAVI --gpu 0 -f flickr_ewc --test
python eval.py -j flickr30k-SAVI --gpu 0 -f flickr_lwf --test
python eval.py -j flickr30k-SAVI --gpu 0 -f flickr_ratt --test
After training a folder per each model will be created into the model/
folder.
In each model folder you will find the weights of the model at the end of each training epoch,
a result csv file per each task that report per-epoch performances over validation set,
a result csv file with all aggregated epochs (results_all_[...].csv)and a result csv file with only the epoch
until the best performing epoch over validation set respect to BLEU-4 sore (results_best_[...].csv).
In this section we describe the main command-line tools and other scripts that can be used to run the experiments and evaluate trained models.
This can be used to train a model from scratch or to continue the training of an existing model.
Use -j JOB_NAME
to select a job, -a APPROACH_NAME
to select an approach from the available ones,
-ne N
to choose the number of training epochs per each task, -lr
to select the
learning rate (e.g. --lr 1e-5
), --bs
to select the batch size,
--gpu X
to run on the seleted GPU, etc..
Each technique has also a set of available aguments or flags starting with the technique name,
that are simply ignored when used on a different technique.
$ python train.py --help
Train a continual learning model for image captioning with different approaches.
Optional arguments:
-h, --help show this help message and exit
-j, --job {coco-TASFI,flickr30k-SAVI} Select the job name to use for the current experiment.
--test Use test-set instead of validation set
-bs BS Batch size to be used during training and evaluation.
-f, --folder FOLDER Model folder where to load/save weights and csv files.
-a, --approach {ft, ewc, lwf, ratt, ratt_ablation}
-t, --task TASK Continue training the model from the selected task.
-e, --epoch EPOCH Continue training the model from the selected epoch, loading weights from previous one.If epoch 1 is chosen, best epoch of the previous task will be loaded.
-l, --load LOAD Load the best model weights from the first task of selected model/folder
--hidden-size HIDDEN_SIZE Number of neurons in LSTM hidden layer (hidden-state size)
-emb-size EMB_SIZE Number of neurons in image and word embedding layers (LSTM input size)
-mdl, --max-decode-len MDL Max decoding lenght for sampling (evaluation)
-ne, --nb-epochs NB_EPOCHS Number of training epoch to run on each task.
-ex, --examples EXAMPLES Number of examples to use in each task during training, useful to speedup debugging.
-ee, --extra-epochs EXTRA Extra epochs for the first task.
-lr LR Learning rate for Adam optimization algorithm
-wd WD Weight decay regularization.
--freeze-old-words Prevent words to be trained in current task when they appeared in one of the previous tasks
--ewc-sampling {true,max_pred,multinomial}
--ewc-teacher-forcing Enable teacher forcing when computing fisher matrix
--ewc-lambda EWC_LAMBDA Loss multiplier applied to EWC loss
--lwf-lambda LWF_LAMBDA Loss multiplier applied to LwF loss
--lwf-T LWF_T Temperature for LwF loss
--lwf-h-distill Distill hidden state together with output predictions
--lwf-h-lambda LWF_H_LAMBDA
Loss multiplier applied to hidden state LwF loss, when --lwf-h-distill is enabled
--ratt-lambda RATT_LAMBDA Loss multiplier applied to RATT loss
--ratt-thres-cosh RATT_THRES_COSH
--ratt-smax RATT_SMAX Maximum value for scaling parameter s.
--ratt-usage RATT_USAGE Network usage at the beginning of the train task.
--ratt-bin-backward Binarize RATT bacwkard masks.
--ratt-bin-forward Binarize RATT forwward masks.
--ratt-emb Enable masks for Embedding layers when executing RATT ablation
--ratt-cls Enable masks for classifier layers when executing RATT ablation
-s SEED, --seed SEED Chose the seed for current experiment
-w WORKERS, --workers WORKERS Number of workers for dataloader
-g GPU, --gpu GPU GPU to be used from CUDA
--threads THREADS Number of threads that torch will be able to use.
--pin Pin GPU memory.
$ python eval.py --help
Evaluate a pre-trained continual learning model for image captioning.
optional arguments:
-h, --help show this help message and exit
-j, --job {coco-TASFI,flickr30k-SAVI} Select the job name to use for the current experiment.
--test Use test-set instead of validation set
-bs BS Batch size to be used during training and evaluation.
-f, --folder FOLDER [FOLDER ...] Model folders where to load weights from.
-o, --out OUT Output file name. If not specified the model folder name will be used.
-t, --task TASK [TASK ...] Evaluate the model loading weights related to selected tasks. Use -1 (default) to load the last task.
-e, --epoch EPOCH [EPOCH ...] Evaluate model loading weights at the selected epochs of the selected task. Use -1 (default) to load weights at best validation epoch.
--ratt-bin-forward Force binarization of forward masks for RATT approach
-s, --seed SEED Chose the seed for current experiment
-w, --workers WORKERS Number of workers for dataloader
-g, --gpu GPU GPU to be used from CUDA
--threads THREADS Number of threads that torch will be able to use.
--pin Pin GPU memory.
This script can be used to generate some of the plots showed in the paper, but it's not a command-line tool: code should be modified to generate the correct plot for the correct model.
This command line tool can be used to resize all the images in MS-COCO dataset
saving jpeg version of each resized image into a new directory.
The default size is defined in coco_settings.py (256), parameter --val
is needed
when we want to process MS-COCO validation set instead of the training set.
This tool is not really needed anymore, because coco_feats.py
can directly
read the original jpeg images and resize on the fly before processing with the CNN.
You should use the flag --resized-path
on coco_feats.py
if you want to process
the images already resized with coco_resize.py
.
Pre-trained models can be downloaded from releases in this repo.
You can unzip each model folder in the models/
directory in the root of the project:
RATT-master/models/<MODEL_NAME>/CONTENT-OF-ZIP-FOLDER
For each model and for each task we provide the weights at the best validation epoch (respect to belu-4 score) and at the last trained epoch. We also give a .info text file containing all the information related to the training parameters and hyperparameters.
These files are needed to correctly execute evaluation over the test-set or to continue the training of a model.
Csv files containing validation performances computed at the end of each training epoch are provided with each model.