Cuberick-Orion / CIRR

Official repository of ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models

Home Page:https://cuberick-orion.github.io/CIRR/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Composed Image Retrieval on Real-life Images

arXiv arXiv

This repository contains the Composed Image Retrieval on Real-life images (CIRR) dataset.

For details please see our ICCV 2021 paper - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models.

Tip

You are currently viewing the Dataset repository.

Site navigation > Project homepage | Code repository

If you wish to develop on this task using our codebase, we recommend first checking out the Code repository, setting up the code locally, and then downloading the dataset.

News and Upcoming Updates

  • Please note there is a typo in our paper (Table 2) -- the number of pairs in val is 4,184 4,181.

Download CIRR Dataset

Our dataset is structured in a similar way as Fashion-IQ, an existing dataset on this task.

Annotations

Obtain the annotations by:

# create a `data` folder at your desired location
mkdir data
cd data

# clone the cirr_dataset branch to the local data/cirr folder
git clone -b cirr_dataset git@github.com:Cuberick-Orion/CIRR.git cirr

The data/cirr folder contains all relevant annotations. The file structure is described below.

Raw Images

Updated June 2023

Recent methods of Composed Image Retrieval (and related tasks) often use raw images rather than our pre-extracted features. Though, we are not at liberty to distribute these images. If you'd like to access them, please refer to our image source NLVR2.

Important

We do not recommend downloading the images by URLs, as it contains too many broken links, and the downloaded files lack the sub-folder structure in the /train folder. Instead, we suggest following the instructions here to directly access the images. To quote the authors:

To obtain access, please fill out the linked Google Form. This form asks for your basic information and asks you to agree to our Terms of Service. We will get back to you within a week. If you have any questions, please email nlvr@googlegroups.com.

You can also email us if, for any reason, you receive no response from the NLVR2 group.

Pre-extracted Image Features

The available types of image features are:

Each zip file we provide contains a folder of individual image feature files .pkl.

Once downloaded, unzip it into data/cirr/, following the file structure below.

Dataset File Structure

The downloaded dataset should look like this (click to expand)
data
└─── cirr
    ├─── captions
    │        cap.VER.test1.json
    │        cap.VER.train.json
    │        cap.VER.val.json
    ├─── captions_ext
    │        cap.ext.VER.test1.json
    │        cap.ext.VER.train.json
    │        cap.ext.VER.val.json
    ├─── image_splits
    │        split.VER.test1.json
    │        split.VER.train.json
    │        split.VER.val.json
    ├─── img_raw  
    │    ├── train
    │    │    ├── 0 # sub-level folder structure inherited from NLVR2 (carries no special meaning in CIRR)
    │    │    │    <IMG0_ID>.png
    │    │    │    <IMG0_ID>.png
    │    │    │         ...
    │    │    ├── 1
    │    │    │    <IMG0_ID>.png
    │    │    │    <IMG0_ID>.png
    │    │    │         ...
    │    │    ├── 2
    │    │    │    <IMG0_ID>.png
    │    │    │    <IMG0_ID>.png
    │    │    └──       ...
    │    ├── dev         
    │    │      <IMG0_ID>.png
    │    │      <IMG1_ID>.png
    │    │           ...
    │    └── test1       
    │           <IMG0_ID>.png
    │           <IMG1_ID>.png
    │                ...
    ├─── img_feat_res152 
    │        <Same subfolder structure as above>
    └─── img_feat_frcnn         
             <Same subfolder structure as above>

Dataset File Description

  • captions/cap.VER.SPLIT.json

    • A list of elements, where each element contains core information on a query-target pair.

    • Details on each entry can be found in the supp. mat. Sec. G of our paper.

    • Click to see an example
          {"pairid": 12063, 
          "reference":   "test1-147-1-img1", 
          "target_hard": "test1-83-0-img1", 
          "target_soft": {"test1-83-0-img1": 1.0}, 
          "caption": "remove all but one dog and add a woman hugging   it", 
          "img_set": {"id": 1, 
                      "members": ["test1-147-1-img1", 
                                  "test1-1001-2-img0",  
                                  "test1-83-1-img1",           
                                  "test1-359-0-img1",  
                                  "test1-906-0-img1", 
                                  "test1-83-0-img1"],
                      "reference_rank": 3, 
                      "target_rank": 4}
          }
  • captions_ext/cap.ext.VER.SPLIT.json

    • A list of elements, where each element contains auxiliary annotations on a query-target pair.

    • Details on the auxiliary annotations can be found in the supp. mat. Sec. C of our paper.

    • Click to see an example
          {"pairid": 12063, 
          "reference":   "test1-147-1-img1", 
          "target_hard": "test1-83-0-img1", 
          "caption_extend": {"0": "being a photo of dogs", 
                            "1": "add a big dog", 
                            "2": "more focused on the hugging", 
                            "3": "background should contain grass"}
          }
  • image_splits/split.VER.SPLIT.json

    • A dictionary, where each key:value pair maps an image filename to the relative path of the img file, example:
      "test1-147-1-img1": "./test1/test1-147-1-img1.png",
      # or
      "train-11041-2-img0": "./train/34/train-11041-2-img0.png"
    • image filenames and (train-split) sub-level folder structures are preserved from the NLVR2 dataset.
  • img_feat_<...>/

    • A folder containing a certain type of pre-extracted image features, each file saves the feature of one image.
    • Filename is generated as such:
      <IMG0_ID> = "test1-147-1-img1.png".replace('.png','.pkl')
      in this case, test1-147-1-img1.pkl, so that each file can be directly indexed by its name.

Test-split Evaluation Server

We do not publish the ground truth for the test split of CIRR. Instead, an evaluation server is hosted here, should you prefer to publish results on the test-split. The functions of the test-split server will be incrementally updated.

See test-split server instructions.

The server is hosted independently at CECS ANU, so please email us if the site is down.

License

  • We have licensed the annotations of CIRR under the MIT License. Please refer to the LICENSE file for details.

  • Following NLVR2 Licensing, we do not license the images used in CIRR, as we do not hold the copyright to them.

  • The images used in CIRR are sourced from the NLVR2 dataset. Users shall be bounded by its Terms of Service.

Citation

Please cite our paper if it helps your research:

@InProceedings{Liu_2021_ICCV,
    author    = {Liu, Zheyuan and Rodriguez-Opazo, Cristian and Teney, Damien and Gould, Stephen},
    title     = {Image Retrieval on Real-Life Images With Pre-Trained Vision-and-Language Models},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {2125-2134}
}

Contact

If you have any questions regarding our dataset, model, or publication, please create an issue in the project repository, or email us.

About

Official repository of ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models

https://cuberick-orion.github.io/CIRR/

License:MIT License