The MuRel network is a Machine Learning model learned end-to-end to answer questions about images. It relies on the object bounding boxes extracted from the image to build a complitely connected graph where each node corresponds to an object or region. The MuRel network contains a MuRel cell over which it iterates to fuse the question representation with local region features, progressively refining visual and question interactions. Finally, after a global aggregation of local representations, it answers the question using a bilinear model. Interestingly, the MuRel network doesn't include an explicit attention mechanism, usually at the core of state-of-the-art models. Its rich vectorial representation of the scene can even be leveraged to visualize the reasoning process at each step.
The MuRel cell is a novel reasoning module which models interactions between question and image regions. Its pairwise relational component enriches the multimodal representations of each node by taking their context into account in the modeling.
In this repo, we make our datasets and models available via pip install. Also, we provide pretrained models and all the code needed to reproduce the experiments from our CVPR 2019 paper.
- Installation
- Quick start
- Reproduce results
- Pretrained models
- Useful commands
- Citation
- Poster
- Authors
- Acknowledgment
We don't provide support for python 2. We advise you to install python 3 with Anaconda. Then, you can create an environment.
conda create --name murel python=3.7
source activate murel
git clone --recursive https://github.com/Cadene/murel.bootstrap.pytorch.git
cd murel.bootstrap.pytorch
pip install -r requirements.txt
Download annotations, images and features for VQA experiments:
bash murel/datasets/scripts/download_vqa2.sh
bash murel/datasets/scripts/download_vgenome.sh
bash murel/datasets/scripts/download_tdiuc.sh
bash murel/datasets/scripts/download_vqacp2.sh
Note: The features have been extracted from a pretrained Faster-RCNN with caffe. We don't provide the code for pretraining or extracting features for now.
By importing the murel
python module, you can access datasets and models in a simple way:
from murel.datasets.vqacp2 import VQACP2
from murel.models.networks.murel_net import MurelNet
from murel.models.networks.murel_cell import MurelCell
from murel.models.networks.pairwise import Pairwise
To be able to do so, you can use pip:
pip install murel.bootstrap.pytorch
Or install from source:
git clone https://github.com/Cadene/murel.bootstrap.pytorch.git
python setup.py install
Note: This repo is built on top of block.bootstrap.pytorch. We import VQA2, TDIUC, VGenome from the latter.
The boostrap/run.py file load the options contained in a yaml file, create the corresponding experiment directory and start the training procedure. For instance, you can train our best model on VQA2 by running:
python -m bootstrap.run -o murel/options/vqa2/murel.yaml
Then, several files are going to be created in logs/vqa2/murel
:
- options.yaml (copy of options)
- logs.txt (history of print)
- logs.json (batchs and epochs statistics)
- view.html (learning curves)
- ckpt_last_engine.pth.tar (checkpoints of last epoch)
- ckpt_last_model.pth.tar
- ckpt_last_optimizer.pth.tar
- ckpt_best_eval_epoch.accuracy_top1_engine.pth.tar (checkpoints of best epoch)
- ckpt_best_eval_epoch.accuracy_top1_model.pth.tar
- ckpt_best_eval_epoch.accuracy_top1_optimizer.pth.tar
Many options are available in the options directory.
At the end of the training procedure, you can evaluate your model on the testing set. In this example, boostrap/run.py load the options from your experiment directory, resume the best checkpoint on the validation set and start an evaluation on the testing set instead of the validation set while skipping the training set (train_split is empty). Thanks to --misc.logs_name
, the logs will be written in the new logs_test.txt
and logs_test.json
files, instead of being appended to the logs.txt
and logs.json
files.
python -m bootstrap.run \
-o logs/vqa2/murel/options.yaml \
--exp.resume best_accuracy_top1 \
--dataset.train_split \
--dataset.eval_split test \
--misc.logs_name test
We use this simple setup to tune our hyperparameters on the valset.
python -m bootstrap.run \
-o murel/options/vqa2/murel.yaml \
--exp.dir logs/vqa2/murel
This heavier setup allows us to train a model on 95% of the concatenation of train and val sets, and to evaluate it on the 5% rest. Then we extract the predictions of our best checkpoint on the testset. Finally, we submit a json file on the EvalAI web site.
python -m bootstrap.run \
-o murel/options/vqa2/murel.yaml \
--exp.dir logs/vqa2/murel_trainval \
--dataset.proc_split trainval
python -m bootstrap.run \
-o logs/vqa2/murel_trainval/options.yaml \
--exp.resume best_eval_epoch.accuracy_top1 \
--dataset.train_split \
--dataset.eval_split test \
--misc.logs_name test
Same, but we add pairs from the VisualGenome dataset.
python -m bootstrap.run \
-o murel/options/vqa2/murel.yaml \
--exp.dir logs/vqa2/murel_trainval_vg \
--dataset.proc_split trainval \
--dataset.vg True
python -m bootstrap.run \
-o logs/vqa2/murel_trainval_vg/options.yaml \
--exp.resume best_eval_epoch.accuracy_top1 \
--dataset.train_split \
--dataset.eval_split test \
--misc.logs_name test
You can compare experiments by displaying their best metrics on the valset.
python -m murel.compare_vqa_val -d logs/vqa2/murel logs/vqa2/attention
It is not possible to automaticaly compute the accuracies on the testset. You need to submit a json file on the EvalAI platform. The evaluation step on the testset creates the json file that contains the prediction of your model on the full testset. For instance: logs/vqa2/murel_trainval_vg/results/test/epoch,19/OpenEnded_mscoco_test2015_model_results.json
. To get the accuracies on testdev or test sets, you must submit this file.
python -m bootstrap.run \
-o murel/options/vqacp2/murel.yaml \
--exp.dir logs/vqacp2/murel
python -m murel.compare_vqa_val -d logs/vqacp2/murel logs/vqacp2/attention
The full training set is split into a trainset and a valset. At the end of the training, we evaluate our best checkpoint on the testset. The TDIUC metrics are computed and displayed at the end of each epoch. They are also stored in logs.json
and logs_test.json
.
python -m bootstrap.run \
-o murel/options/tdiuc/murel.yaml \
--exp.dir logs/tdiuc/murel
python -m bootstrap.run \
-o logs/tdiuc/murel/options.yaml \
--exp.resume best_eval_epoch.accuracy_top1 \
--dataset.train_split \
--dataset.eval_split test \
--misc.logs_name test
You can compare experiments by displaying their best metrics on the valset or testset.
python -m murel.compare_tdiuc_val -d logs/tdiuc/murel logs/tdiuc/attention
python -m murel.compare_tdiuc_test -d logs/tdiuc/murel logs/tdiuc/attention
TODO
Instead of creating a view.html
file, a tensorboard file will be created:
python -m bootstrap.run -o murel/options/vqa2/murel.yaml \
--view.name tensorboard
tensorboard --logdir=logs/vqa2
You can use plotly and tensorboard at the same time by updating the yaml file like this one.
For a specific experiment:
CUDA_VISIBLE_DEVICES=0 python -m boostrap.run -o murel/options/vqa2/murel.yaml
For the current terminal session:
export CUDA_VISIBLE_DEVICES=0
The boostrap.pytorch framework makes it easy to overwrite a hyperparameter. In this example, we run an experiment with a non-default learning rate. Thus, I also overwrite the experiment directory path:
python -m bootstrap.run -o murel/options/vqa2/murel.yaml \
--optimizer.lr 0.0003 \
--exp.dir logs/vqa2/murel_lr,0.0003
If a problem occurs, it is easy to resume the last epoch by specifying the options file from the experiment directory while overwritting the exp.resume
option (default is None):
python -m bootstrap.run -o logs/vqa2/murel/options.yaml \
--exp.resume last
TODO
TODO
@InProceedings{Cadene_2019_CVPR,
author = {Cadene, Remi and Ben-Younes, Hedi and Thome, Nicolas and Cord, Matthieu},
title = {MUREL: {M}ultimodal {R}elational {R}easoning for {V}isual {Q}uestion {A}nswering},
booktitle = {{IEEE} Conference on Computer Vision and Pattern Recognition {CVPR}},
year = {2019},
url = {http://remicadene.com/pdfs/paper_cvpr2019.pdf}
}
TODO
This code was made available by Hedi Ben-Younes (Sorbonne-Heuritech), Remi Cadene (Sorbonne), Matthieu Cord (Sorbonne) and Nicolas Thome (CNAM).
Special thanks to the authors of VQA2, TDIUC, VisualGenome and VQACP2, the datasets used in this research project.