Jason-Qiu / MM_Robustness

[DMLR 2024] Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MM_Robustness

Journal of Data-centric Machine Learning Research (DMLR)

More details can be found on the project webpage.

The code for generating multimodal robustness evaluation datasets for downstream image-text applications, including image-text retrieval, visual reasoning, visual entailment, image captioning, and text-to-image generation.

Citation

If you feel our code or models help your research, kindly cite our papers:

@inproceedings{Qiu2022BenchmarkingRO,
  title={Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift},
  author={Jielin Qiu and Yi Zhu and Xingjian Shi and F. Wenzel and Zhiqiang Tang and Ding Zhao and Bo Li and Mu Li},
  journal={Journal of Data-centric Machine Learning Research (DMLR)},
  year={2024}
}

Installation

./install.sh

Datasets

Generate perturbation datasets

Evaluation data for text-to-image generation

For the text-to-image generation evaluation, we used the captions from COCO as prompt to generate the corresponding images. We also share the generated images here.

Baselines

For the evaluated baselines, plase see evaluated_baselines

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

About

[DMLR 2024] Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift

License:Apache License 2.0


Languages

Language:Python 91.5%Language:Cuda 6.7%Language:C++ 1.3%Language:C 0.5%Language:Shell 0.0%