IREG

Official implement for paper: "Whether you can locate or not? Interactive Referring Expression Generation"

🔥 News

2024.3.20: Release the codebase.
2023.7.26: Our paper is accepted by ACM MM 2023 Main Track.

Step1 Feature extract

First download two sets of data:

COCO2014: Train images and Train/Val annotations
Refcoco: Refcoco,Refcoco+,Refcocog

The feature extraction code is under misc/feature_extraction and is divided into two types:

Extract the features of the 36 proposal bboxes that touch the base: the code is in refcocog_proposal.py
Extract features of a given box: The code is in refcocog_target.py

Feature extraction requires the installation of detectron2. Just refer to the installation link in VLT5 github. It can be solved with one line of commands;

Pretrained Checkpoints

Download from：

VLT5 Epoch30.pth link: https://drive.google.com/drive/folders/12Acv2YLQSxgrx_-4mahUvqNikcz7XfPi
OFA Refcoco, Refcoco+, Refcocog base ckpt, details can be seen in:https://github.com/OFA-Sys/OFA/blob/main/checkpoints.md#finetuning-ofa-base

Enviroment setup

python version 3.7.4

pip install -r requirements.txt

Start

cd Dialog
bash scripts/REG_VLT5.sh 2 refcoco unc 0 1 25552

代码结构

1.1 ckpt

Store all checkpoints during the training process;

1.2 misc

Including feature extraction, bad re collection, testing, visualization and draft code, etc.;

1.3 OFA

OFA's base warehouse has modified the refcoco_eval part;

1.4 scripts

The training startup script needs to change the data and pre-training model weight paths accordingly;

1.5 src

The main code is here

1.5.1 eval_utils

Ref test codebase;

1.5.2 modeling

Main model file, huggingface style;

1.5.3 tools

Various functional functions, parameter files, distributed tool functions, training base framework, etc. are all here;

reg_data.py, reg_model.py, reg.py: Mainly responsible for base and RL training. The tests here only include the most basic one-shot test;

multitask_reg_data.py, multitask_reg_model.py, multitask_reg.py are mainly responsible for:

Dialog Training, Dialog Training only needs to modify the Dataset. In fact, a DialogDataset is added to reg_data;
The process of Dialog Test is a little more complicated. It needs to be determined whether it has passed the OFA test, but in fact there is only one function written in multitask_reg_model;
In the main process of multitask_reg.py, it is necessary to change the logic that one more base model is needed to serve the test. In multitask_reg, self.model is the refiner and basemodel is the basemodel.

✒ Citation

Please cite our paper if you find it helpful :)

@misc{ye2023locate,
      title={Whether you can locate or not? Interactive Referring Expression Generation}, 
      author={Fulong Ye and Yuxing Long and Fangxiang Feng and Xiaojie Wang},
      year={2023},
      eprint={2308.09977},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

Official implement for paper: "Whether you can locate or not? Interactive Referring Expression Generation"

Languages

Language:Python 91.7%Language:Shell 4.5%Language:Cuda 2.1%Language:C++ 1.1%Language:Jupyter Notebook 0.3%Language:Cython 0.2%Language:Lua 0.1%Language:Dockerfile 0.0%Language:Makefile 0.0%Language:Batchfile 0.0%