Journal of Data-centric Machine Learning Research (DMLR)
More details can be found on the project webpage.
The code for generating multimodal robustness evaluation datasets for downstream image-text applications, including image-text retrieval, visual reasoning, visual entailment, image captioning, and text-to-image generation.
If you feel our code or models help your research, kindly cite our papers:
@inproceedings{Qiu2022BenchmarkingRO,
title={Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift},
author={Jielin Qiu and Yi Zhu and Xingjian Shi and F. Wenzel and Zhiqiang Tang and Ding Zhao and Bo Li and Mu Li},
journal={Journal of Data-centric Machine Learning Research (DMLR)},
year={2024}
}
./install.sh
- The original datasets can be downloaded from the original website:
-
For image perturbation, please see image_perturbation
-
For text perturbation, please see text_perturbation
-
For detection score, please see detection_score
For the text-to-image generation evaluation, we used the captions from COCO as prompt to generate the corresponding images. We also share the generated images here.
For the evaluated baselines, plase see evaluated_baselines
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.