Break: A Question Understanding Benchmark

This repository contains code by Mor Geva, Ankit Gupta and Tomer Wolfson for our paper, "Break It Down: A Question Understanding Benchmark" (TACL 2020). The repository features the codebase and models from our paper.
For the Break dataset please refer to: https://allenai.github.io/Break

Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations (QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.

Changelog

4/10/2020 Pretrained QDMR Parsing models are now available.
2/24/2020 Open-domain QA experiments are now available.
2/20/2020 QDMR parsing models and evaluation are now available.
2/1/2020 The full dataset has been publicly released at https://allenai.github.io/Break.

Structure

The repository features:

The QDMR Parsing models, by Mor Geva
- Pretained QDMR Parsing models
The Open-domain QA models utilizing QDMR, by Ankit Gupta
The annotation pipeline of Break
Code for converting QDMR to logical-form

Reference

@article{Wolfson2020Break,
  title={Break It Down: A Question Understanding Benchmark},
  author={Wolfson, Tomer and Geva, Mor and Gupta, Ankit and Gardner, Matt and Goldberg, Yoav and Deutch, Daniel and Berant, Jonathan},
  journal={Transactions of the Association for Computational Linguistics},
  year={2020},
}

About

MIT License

Languages

Language:JavaScript 60.9%Language:Python 37.3%Language:Jupyter Notebook 1.6%Language:HTML 0.1%Language:CSS 0.0%