yxchng / mask-grounding

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mask Grounding

PyTorch Python

This repository contains official code for CVPR2024 paper:

Mask Grounding for Referring Image Segmentation

Introduction

Mask Grounding is an innovative auxiliary task that can significantly improve the performance of existing Referring Image Segmentation algorithms by explicitly teaching these models to learn fine-grained visual grounding in their language features. Specifically, during training, these models are given sentences with randomly masked textual tokens and must learn to predict the identities of these tokens based on their surrounding textual, visual and segmentation information.


Setup

  1. Follow LAVT instructions for environment set-up and data preparation.
  2. Download pretrained weights.

Testing

Running bash run_test.sh will reproduce all the results published in this work.

Results

Backbone RefCOCO (val) RefCOCO (testA) RefCOCO (testB) RefCOCO+ (val) RefCOCO+ (testA) RefCOCO+ (testB) G-Ref (val(U)) G-Ref (test(U)) G-Ref (val(G))
CRIS 70.47 73.18 66.10 62.27 68.08 53.68 59.87 60.36 -
LAVT 72.73 75.82 68.79 62.14 68.38 55.10 61.24 62.09 60.50
ReLA 73.82 76.48 70.18 66.04 71.02 57.65 65.00 65.97 62.70
MagNet (Ours) 75.24 78.24 71.05 66.16 71.32 58.14 65.36 66.03 63.13

Citation

@inproceedings{chng2023mask,
  title={Mask Grounding for Referring Image Segmentation},
  author={Chng, Yong Xien and Zheng, Henry and Han, Yizeng and Qiu, Xuchong and Huang, Gao},
  booktitle={CVPR},
  year={2024}
}

Reference

This code is built on LAVT, Mask2Former-Simplify, ovr-cnn.

About

License:GNU Affero General Public License v3.0


Languages

Language:Python 58.0%Language:Jupyter Notebook 35.7%Language:Cuda 5.4%Language:C++ 0.6%Language:Shell 0.3%Language:Makefile 0.0%