pinglmlcv/ReferringExpressions

Introduction

For more information read the original paper

"Generation and comprehension of unambiguous object descriptions." Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L. Yuille, Kevin Murphy; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

And our paper

"SUNSpot : An RGB-D dataset with spatial referring expressions." Cecilia Mauceri, Martha Palmer, and Christoffer Heckman; ICCV19 CLVL: 3rd Workshop on Closing the Loop Between Vision and Language, 2019.

Installation

These networks can be run with or with CUDA support. We have tested this project on two machines; A MacBook Pro with Intel Core i7 and a Ubuntu Server with Intel Xeon Processor and Nvidia P6000 cards.

Install the following packages in your python environment. We recommend using a new anaconda environment, to avoid messing up other installations.

pytorch 1.1
Cython
tqdm
scikit-image
yacscond
tensorflow (for using tensorboard)
future

conda create --name refexp_generation
conda activate refexp_generation

# Check https://pytorch.org for appropriate pytorch package
# The following installs vanilla pytorch without CUDA
conda install pytorch torchvision -c pytorch 

conda install Cython tqdm scikit-image future
pip install yacs

# Check https://www.tensorflow.org/install for appropriate tensorflow package
# The following installs vanilla tensorflow without CUDA
pip install tensorflow

Install the cocoapi

git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI/
make
pip install -e .
cd ../..

For evaluation, install nlg-eval

# Install Java 1.8.0 (or higher). Then run:

git clone https://github.com/Maluuba/nlg-eval.git
cd nlg-eval

# Install the Python dependencies.
# It may take a while to run because it's downloading some files. You can instead run `pip install -v -e .` to see more details.
pip install -e .

# Download required data files.
nlg-eval --setup

cd ..

Datasets

SUNSpot

Make a <data_root> directory for SUNSpot, for example data/sunspot/.
Download the SUNRGBD images. The directory you save them in will be your <img_root>.
Download the SUNSpot annotations and unzip them in <data_root>

Publicly available datasets

Download additional referring expressions datasets from https://github.com/lichengunc/refer

We use MegaDepth to generate synthetic depth images for the COCO dataset.

Make your own referring expressions dataset

Make a directory for your dataset, for example data/<your_dataset>/. This will be your <data_root>.
Make a COCO style annotation file describing your images and bounding box annotations and save as <data_root>/instance.json

Save your referring expressions as a pickle file, <data_root>/ref(<version_name>).p, with the structure:

refs: list of dict [
    {
    image_id : unique image id (int)
    split : train/test/val (str)
    sentences : list of dict [
        {
        tokens : tokenized version of referring expression (list of str)
        raw : unprocessed referring expression (str)
        sent : referring expression with mild processing, lower case, spell correction, etc. (str)
        sent_id : unique referring expression id (int)
        } ...
    ]
    file_name : file name of image relative to img_root (str)
    category_id : object category label (int)
    ann_id : id of object annotation in instance.json (int)
    sent_ids : same ids as nested sentences[...][sent_id] (list of int)
    ref_id : unique id for refering expression (int)
    } ...
]

Optional : If you have depth images, make a mapping file, <data_root>/depth.json which maps image ids to depth file paths
```
{
    <image_id> : file name of depth image relative to depth_root  (str)
    ...    
}
```

You can check if the dataset loads correctly by running

python src/data_management/refer.py --data_root <data_root> --img_root <img_root> --depth_root <depth_root> --version <version_name> --dataset <dataset_name>

How to Use Networks

Config Files

We use the yacs config system. Configurations are set in three spots

Default configurations
Configuration files
Command line overrides - for example you can change the number of epochs from what is specified in the config file with

python src/run_network.py <config_file> train TRAINING.N_EPOCH 60

Configs referenced in "SUNSpot : An RGB-D dataset with spatial referring expressions."

Baseline - configs/refcocog_baseline.yaml
Baseline+fine - configs/sunspot_baseline.yaml
VGG - configs/refcocog_baseline_custom_vgg.yaml
VGG+D - configs/refcocog_depth_baseline.yaml
VGG+fine - configs/sunspot_baseline_custom_vgg.yaml
VGG+D+fine - configs/sunspot_depth_baseline.yaml

The image classification networks which were pretrained for VGG+D and VGG+D+fine are mscoco_depth_classification_l2_10e-5_BCE.yaml

Training

Define a config file and run the following

python src/run_network.py <config_file> train <additional config variables>

Testing

python src/run_network.py <config_file> test <additional config variables>

Will run the most recently saved checkpoint. It will also save generated referring expressions and comprehension results in a file output/cfg.OUTPUT.CHECKPOINT_PREFIX_cfg.DATASET.NAME_<data_split>.json

Choose which data splits to run on using the following config variables

# Defaults
cfg.TEST.DO_TRAIN = True # Run on train set
cfg.TEST.DO_VAL = True # Run on val set
cfg.TEST.DO_TEST = True # Run on test set
cfg.TEST.DO_ALL = False # If false, only random sample of <=10000 images are tested from each set

For referring expressions networks, to calculate evaluation metrics, run

python src/mt_metrics.py <config_file> <output_file>

For image classification networks, use

python src/classification_metrics.py <config_file> <output_file>

License

Licensed under the Apache License, Version 2.0. See LICENSE for additional details

pinglmlcv / ReferringExpressions