Modeling Relationships in Referential Expressions with Compositional Modular Networks

This repository contains the code for the following paper:

R. Hu, M. Rohrbach, J. Andreas, T. Darrell, K. Saenko, Modeling Relationships in Referential Expressions with Compositional Modular Networks. in CVPR, 2017. (PDF)

@article{hu2017modeling,
  title={Modeling Relationships in Referential Expressions with Compositional Modular Networks},
  author={Hu, Ronghang and Rohrbach, Marcus and Andreas, Jacob and Darrell, Trevor and Saenko, Kate},
  journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2017}
}

Project Page: http://ronghanghu.com/cmn

Note: part of this repository is built upon the Faster RCNN code (https://github.com/rbgirshick/py-faster-rcnn), which is under the MIT License.

Installation

Install Python 3 (Anaconda recommended: https://www.continuum.io/downloads)
Install TensorFlow (v1.0.0 or higher) following the instructions here. TensorFlow must be installed with GPU support.
Download this repository or clone with Git, and then enter the root directory of the repository:
git clone https://github.com/ronghanghu/cmn.git && cd cmn
Depending on your system, you may need to re-build the NMS lib and the ROIPooling operation:

export CMN_ROOT=$(pwd)
cd $CMN_ROOT/util/faster_rcnn_lib/ && make
cd $CMN_ROOT/util/roi_pooling/ && ./compile_roi_pooling.sh
cd $CMN_ROOT

The compile_roi_pooling.sh uses g++-4.8 and CUDA 8.0 to match the binary installation of TensorFlow 1.0.0 on Linux. If you installed TensorFlow from source and used different compiler or CUDA version, modify compile_roi_pooling.sh accordingly to match your installation.

Download data

Download the model weights of VGG-16 network (and Faster-RCNN VGG-16 network) converted from Caffe model:
./models/convert_caffemodel/params/download_vgg_params.sh
Download the GloVe matrix for word embedding:
./word_embedding/download_embed_matrix.sh

Training and evaluation on the Visual Genome relationship dataset

Download the Visual Genome dataset from http://visualgenome.org/ and symbol link it to exp-visgeno-rel/visgeno-dataset
Download the image data (imdb) for training and evaluation:
./exp-visgeno-rel/data/download_imdb.sh
Alternatively, you may build the imdb yourself using ./exp-visgeno-rel/build_visgeno_imdb.ipynb
Add the repository root directory to Python's module path: export PYTHONPATH=.:$PYTHONPATH.
Train the model:
Strong supervision: python ./exp-visgeno-rel/exp_train_visgeno_attbilstm_strong.py
Weak supervision: python ./exp-visgeno-rel/exp_train_visgeno_attbilstm_weak.py
Evaluate the model:
Subject region precision: python ./exp-visgeno-rel/exp_test_visgeno_attbilstm.py
Subject-object pair precision: python ./exp-visgeno-rel/exp_test_visgeno_pair_attbilstm.py
(change model path in the above files to the snapshot path in ./exp-visgeno-rel/tfmodel)

Training and evaluation on the Google-Ref dataset

Download the Google-Ref dataset from https://github.com/mjhucla/Google_Refexp_toolbox and symbol link it to exp-refgoog/refgoog-dataset
Download the image data (imdb) for training and evaluation:
./exp-refgoog/data/download_imdb.sh
Alternatively, you may build the imdb yourself using ./exp-refgoog/build_refgoog_imdb.ipynb
Add the repository root directory to Python's module path: export PYTHONPATH=.:$PYTHONPATH.
Train the model:
python ./exp-refgoog/exp_train_refgoog_attbilstm.py
Evaluate the model (generate prediction output file):
python ./exp-refgoog/exp_test_refgoog_attbilstm.py
(change model path in the above file to the snapshot path in ./exp-refgoog/tfmodel)
Use the evaluation tool in the Google-Ref dataset for evaluation.

Training and evaluation on the Visual-7W dataset (pointing task)

Download the Visual-7W dataset from http://web.stanford.edu/~yukez/visual7w/index.html and symbol link it to exp-visual7w/visual7w-dataset
Download the image data (imdb) for training and evaluation:
./exp-visual7w/data/download_imdb.sh
Alternatively, you may build the imdb yourself with ./exp-visual7w/build_visual7w_imdb.ipynb, exp-visual7w/extract_rpn_proposals.py and exp-visual7w/build_visual7w_imdb_attention.ipynb
Add the repository root directory to Python's module path: export PYTHONPATH=.:$PYTHONPATH.
Train the model:
python exp-visual7w/exp_train_visual7w_attbilstm.py
Evaluate the model:
python exp-visual7w/exp_test_visual7w_attbilstm.py
(change model path in the above file to the snapshot path in ./exp-visual7w/tfmodel)

Training and evaluation on the synthetic shapes dataset

Add the repository root directory to Python's module path: export PYTHONPATH=.:$PYTHONPATH.
Download the synthetic shapes dataset:
./exp-shape/data/download_shape_data.sh
Train the model:
python ./exp-shape/exp_train_shape_attention.py
Evaluate the model:
python ./exp-shape/exp_test_shape_attention.py
(change model path in the above file to the snapshot path in ./exp-shape/tfmodel)

fanlyu / cmn