CORE-MM

Complex Open-ended Reasoning Evaluation for Multi-Modal Large Language Models

Table of Contents

News
Leaderboard
Download
Evaluation
Examples
Citation
License

News

🎉 [2023.12.11] The inference of Core-MM is now supported in VLMEvalKit.
🎉 [2023.11.18] We release paper at arxiv.

Leaderboard

The leaderboard can be found via Papers with Code or project page.

Download

Images and Questions can be downloaded here.

Evaluation

To evaluate on our CORE-MM Benchmark, please follow below steps:

Step 0: Download Images and Questions

Step 1: Generate Response for Your Model

Generate responses for your model on the CORE-MM dataset. The response should be a json file with the following format:

{
  "1": "the answer of question 1",
  "2": "the answer of question 2",
  ...
  "idx": "the answer of question idx"
}

Step 2: Send Predictions to us

After generating responses for your model, please name the json as model_name_model_size.json e.g. CogVLM-Chat_17B.json and send to us via email for evaluation.

We will evaluate your model and send you the results back.

Examples

More examples

Citation

@misc{han2023coremm,
      title={CORE-MM: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models},
      author={Xiaotian Han and Quanzeng You and Yongfei Liu and Wentao Chen and Huangjie Zheng and Khalil Mrini and Xudong Lin and Yiqi Wang and Bohan Zhai and Jianbo Yuan and Heng Wang and Hongxia Yang},
      year={2023},
      eprint={2311.11567},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This project is licensed under the CC BY-NC 4.0.

The copyright of the images belongs to the original authors.

See LICENSE for more information.

Contact

Please feel free to contact us via email infimmbytedance@gmail.com if you have any questions.

core-mm / core-mm