baaaad / ECE

[ECCV'22 Poster] Explicit Image Caption Editing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Explicit Image Caption Editing

This repository contains the datasets and reference code for the paper Explicit Image Caption Editing accpeted to ECCV 2022. Refer to our full paper for detailed intructions and analysis. Example

Overview

The Explicit Caption Editing (ECE) task is defined as follows. Given an image and a reference caption (Ref-Cap), ECE models aim to explicitly predict a sequence of edit operations (e.g., KEEP/DELETE/ADD) on the Ref-Cap, which can translate the Ref-Cap close to the ground-truth caption (GT-Cap). Typically, Ref-Cap is lightly misaligned with the image.

ECE datasets

The ECE datasets include the COCO-EE and Flickr30K-EE.

Specifically, the COCO-EE was built based on dataset MSCOCO, the Flikr30K-EE was built based on the dataset e-ViL and Flickr30K.

Each ECE instance contains three main information:

  • image_id, the original image ID of the given image in the MSCOCO or Flikr30K-EE.
  • Ref-Cap, the reference caption which needs to be edited.
  • GT-Cap, the ground-truth caption of the given image and also the editing target.

Examples from COCO-EE and Flickr30K-EE

Example2

Statistical summary of the COCO-EE and Flickr30K-EE

COCO-EE Flickr30K-EE
Train Dev Test Train Dev Test
#Editing instances 97,567 5,628 5,366 108,238 4,898 4,910
#Images 52,587 3,055 2,948 29,783 1,000 1,000
Mean Reference Caption Length 10.3 10.2 10.1 7.3 7.4 7.4
Mean Ground-Truth Caption Length 9.7 9.8 9.8 6.2 6.3 6.3
Mean Edit Distance 10.9 11.0 10.9 8.8 8.8 8.9

Dataset Construction

The processed datasets have been placed in the dataset folder, they can also be directly download from here, including the COCO-EE and Flickr30K-EE in train, dev and test splits.

Or, you can follow the instructions below to set up the environment and construct them:

COCO-EE Construction

  1. Setup coco-edit submodule and follow its instructions form this.

Flickr30K-EE Construction

  1. Setup environment
    conda create -n flkree python=3.7
    conda activate flkree
    conda install json
    conda install csv
  2. Prepare the esnlive data and the output folder
  3. Construct Flikr30K-EE
python construct_flickr30k_ee.py --split <split>

The ECE model: TIger

The code of our proposed ECE model TIger are now available here.

About

[ECCV'22 Poster] Explicit Image Caption Editing


Languages

Language:Python 100.0%