CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks

Paper | Website | Leaderboard | Download data

CRoW is a multi-task benchmark to evaluate commonsense reasoning ability of NLP systems in solving real-world tasks where this ability is required.

This repo contains the code used to build CRoW benchmark and evaluate models on it. If you would like to download the data for this benchmark and evaluate your own models on it, please check out the Tasks section. We also keep an active leaderboard for this benchmark and you can contribute to it by following the Getting Started guide.

For more information on this benchmark, check the website.

Citation

@inproceedings{ismayilzada2023crow,
    title={CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks},
    author={Mete Ismayilzada and Debjit Paul and Syrielle Montariol and Mor Geva and Antoine Bosselut},
    booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    year={2023}
}

About

Benchmarking Commonsense Reasoning in Real-World Tasks

Languages

Language:HTML 74.4%Language:Python 21.4%Language:SCSS 4.1%Language:JavaScript 0.1%Language:Shell 0.0%Language:Ruby 0.0%