claws-lab / multimodal-robustness-xmai

Repository for ACL'23 paper on Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

XMAI for Multimodal Robustness

Repository for ACL'23 Paper: "Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning"

Authors: Shivaen Ramshetty*, Gaurav Verma*, and Srijan Kumar
Affiliation: Georgia Institute of Technology

Paper (pdf): arXiv, ACL Anthology
Poster (pdf): ACL Underline

Overview Figure

Qualitative Examples

Code, Data, and Resources

We provide an easy to follow repository with guided notebooks detailing our baselines, method, and evaluation.

Datasets and Preprocessed Data

The dataset subsets can be downloaded here:

To allow for rapid experimentation we provide pre-computed objects and attributes for each dataset:

  • MSCOCO Validation 2017: repo
  • SNLI-VE Test: gdrive

Object and Attribute Detection

To perform object and attribute detection yourself:

  1. Setup Bottom-Up Attention Repo or use our docker
  2. Download pretrained model if setting up yourself.
  3. Follow instructions in detector/README.md to capture objects and attributes for the above data or your own.

Augmentation

To augment and evaluate your own data, we provide scripts in XMAI

Notebooks and data for our paper are found within paper_experiments

Baselines

XMAI Method

Evaluation

Citation

@inproceedings{ramshetty2023xmai,
    title={Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning},
    author={Ramshetty, Shivaen and Verma, Gaurav and Kumar, Srijan},
    booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)},
    year={2023}
}

Acknowledgements

We also thank the authors and contributors of the following repositories:

@misc{yu2020buapt,
  author = {Yu, Zhou and Li, Jing and Luo, Tongan and Yu, Jun},
  title = {A PyTorch Implementation of Bottom-Up-Attention},
  howpublished = {\url{https://github.com/MILVLG/bottom-up-attention.pytorch}},
  year = {2020}
}
@inproceedings{dou2022meter,
  title={An Empirical Study of Training End-to-End Vision-and-Language Transformers},
  author={Dou, Zi-Yi and Xu, Yichong and Gan, Zhe and Wang, Jianfeng and Wang, Shuohang and Wang, Lijuan and Zhu, Chenguang and Zhang, Pengchuan and Yuan, Lu and Peng, Nanyun and Liu, Zicheng and Zeng, Michael},
  booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022},
  url={https://arxiv.org/abs/2111.02387},
}
@article{wang2022ofa,
  author    = {Peng Wang and
               An Yang and
               Rui Men and
               Junyang Lin and
               Shuai Bai and
               Zhikang Li and
               Jianxin Ma and
               Chang Zhou and
               Jingren Zhou and
               Hongxia Yang},
  title     = {OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence
               Learning Framework},
  journal   = {CoRR},
  volume    = {abs/2202.03052},
  year      = {2022}
}
@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}

About

Repository for ACL'23 paper on Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning


Languages

Language:Jupyter Notebook 99.2%Language:Python 0.7%Language:Dockerfile 0.0%Language:HTML 0.0%