jongchyisu / attribute_phrases

Reasoning about Fine-grained Attribute Phrases using Reference Games

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Attribute Phrases

This is the dataset and the training code with Tensorflow used in the paper:

Jong-Chyi Su*, Chenyun Wu*, Huaizu Jiang, Subhransu Maji, "Reasoning about Fine-grained Attribute Phrases using Reference Games", International Conference on Computer Vision (ICCV), 2017

@inproceedings{su2017reasoning,
    Author = {Jong-Chyi Su and Chenyun Wu and Huaizu Jiang and Subhransu Maji},
    Title = {Reasoning about Fine-grained Attribute Phrases using Reference Games},
    Booktitle = {International Conference on Computer Vision (ICCV)},
    Year = {2017}
}

[Project page] [Paper]

Dataset

Each pair has 1 pair of images and 5 pairs of corresponding attribute phrases

Image 1 Image 2
commercial plane vs private plane
large plane vs small plane
white and grey vs white with blue and red stripes
twin engines vs single engine
more windows on body vs less windows on body

Stats about the dataset

  • Training set: 4700 pairs
  • Val set: 2350 pairs
  • Test set: 2350 pairs

Requirements

  • Python 2.7
  • Tensorflow v1.0+

Download Dataset

  • User descriptions are included in dataset/visdiff\_SET.json, where SET={train, val, test, trainval}
  • Download images from OID dataset (http://www.robots.ox.ac.uk/~vgg/data/oid)
  • Move images from oid-aircraft-beta-1/data/images/aeroplane/\*.jpg to the folder dataset/images/\*.jpg

Download ImageNet Pre-trained Model

Add pretrained model (e.g. vgg_16.ckpt) in models/checkpoints/

Extract image feature to numpy file to accelerate training

Go to utils/ and run: python get_feature.py --dataset train the numpy file will be saved in img_feat/vgg_16/train.npy

Train Listener Model

Step 1 fix image feature Step 2 finetune image feature

SL (Simple Listener)

  1. python train_listener.py --mode train --log_dir result/SL --pairwise 0 --train_img_model 0 --max_steps 2000 --batch_size 128
  2. python train_listener.py --mode train --log_dir result/SL --pairwise 0 --train_img_model 1 --max_steps 7500 --load_model_path model-fixed-2000 --learn_rate 0.00001

SLr (Simple Listener trained w/o contrastive data)

  1. python train_listener.py --mode train --log_dir result/SLr --pairwise 0 --ran_neg_sample 1 --train_img_model 0 --max_steps 5000 --batch_size 128
  2. python train_listener.py --mode train --log_dir result/SLr --pairwise 0 --ran_neg_sample 1 --train_img_model 1 --max_steps 10000 --load_model_path model-fixed-5000 --learn_rate 0.00001

DL (Discerning Listener)

  1. python train_listener.py --mode train --log_dir result/DL --pairwise 1 --train_img_model 0 --max_steps 2000 --max_sent_length 17 --batch_size 128
  2. python train_listener.py --mode train --log_dir result/DL --pairwise 1 --train_img_model 1 --max_steps 7000 --load_model_path model-fixed-2000 --max_sent_length 17 --learn_rate 0.00001

Evaluate Listener Model

SL

  1. python train_listener.py --mode eval --log_dir result/SL --pairwise 0 --train_img_model 0 --load_model_path model-fixed-2000 --dataset val
  2. python train_listener.py --mode eval --log_dir result/SL --pairwise 0 --train_img_model 1 --load_model_path model-finetune-7500 --dataset val

SLr

  1. python train_listener.py --mode eval --log_dir result/SLr --pairwise 0 --train_img_model 0 --load_model_path model-fixed-5000 --dataset val
  2. python train_listener.py --mode eval --log_dir result/SLr --pairwise 0 --train_img_model 1 --load_model_path model-finetune-10000 --dataset val

DL

  1. python train_listener.py --mode eval --log_dir result/DL --pairwise 1 --train_img_model 0 --load_model_path model-fixed-2000 --dataset val
  2. python train_listener.py --mode eval --log_dir result/DL --pairwise 1 --train_img_model 1 --load_model_path model-finetune-7000 --dataset val

Train Speaker Model

  • Example: python train_speaker.py --speaker_mode=S --img_model=vgg_16 --train_img_model=1 --experiment_path=result/speaker/temp
  • Options:
    • --speaker_mode: S or DS
    • --img_model: alexnet, inception_v3, or vgg_16
    • --train_img_model: Fine-tune image model or not (0 as False, 1 as True)
    • --experiment_path: where to output and save the trained model
    • --load_model_dir: path to the pre-trained model. If not set, train from scratch
    • --load_model_name: model name (model-%steps) in load_model_dir
    • See more options in train_speaker.py

Use Speaker to Generate Attribute Phrases

  • Example: python inference_pairwise.py --input_path=result/speaker/temp --model_step=model-5000 --dataset_name=val
  • Options:
    • --input_path: path to the trained speaker model that you want to use
    • --model_step: model name (model-%steps) in input_path
    • --dataset_name: which sub-dataset to use (train / val / test)
    • See more options in inference_pairwise.py

Discerning Speaker Model

Here we use the listener model to re-rank attribute phrases generated by speaker model. To run this step, you need to have a listenter model, and generated phrases from a speaker model.

  • Example: pyhton rerank.py --listener_path=result/SL --listener_model=model-fixed-2000 --speaker_result_path=result/speaker/temp/infer_annotations_val_model-5000_case0_beam10_sent10.json --infer_dataset=val
  • Options:
    • --listener_path: path to the listener model used for reranking
    • --listener_model: model name (model-%steps) in listener_model
    • --speaker_result_path: the file that saves the phrases generated by a speaker model
    • --infer_dataset: which dataset to work on (train / val / test)
    • See more options in rerank.py

Generate Set-wise Attribute Phrases

  • In "inference_setwise.py", set "speaker_path" as the path to the trained speaker model you want to use
  • run python inference_setwise.py

Authors

Please contact jcsu@cs.umass.edu if you have any question.

  • Jong-Chyi Su (Umass-Amherst)
  • Chenyun Wu (Umass-Amherst)
  • Huaizu Jiang (Umass-Amherst)

About

Reasoning about Fine-grained Attribute Phrases using Reference Games


Languages

Language:Jupyter Notebook 67.7%Language:Python 32.3%