petezh / OpenD5

Tasks for describing differences between text distributions.

Home Page:https://arxiv.org/pdf/2302.14233.pdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OpenD5

Authors: Ruiqi Zhong, Peter Zhang, Steve Li, JinWoo Ahn, Dan Klein, Jacob Steinhardt

Paper link

This repository hosts OpenD5, a benchmark for discovering natural language facts from pairs of corpora. Our paper focuses on the setting comparing two distributions of text via a text description. The repository containing the system is available here.

The benchmark spans a wide array of disciplines and problem types. A sibling repostiory that contains code for running our system for solving these problems is available here.

To create the full benchmark, you should 1) downloaded these folders and 2) run the build_benchmark.sh script from the main repo.

For more details, please refer to the

Downloads

  • The 675 problems in the original paper are available here.
  • An extension with 37 additional problems is available here.
  • A reproduction package for the entire dataset is available here. It includes additional source data that is required to assemble the full dataset.

Contributing

If you'd like to contribute additonal problems to the benchmark, please:

BibTeX

@article{zhong2023goal,
  title={Goal Driven Discovery of Distributional Differences via Language Descriptions},
  author={Zhong, Ruiqi and Zhang, Peter and Li, Steve and Ahn, Jinwoo and Klein, Dan and Steinhardt, Jacob},
  journal={arXiv preprint arXiv:2302.14233},
  year={2023}
}

About

Tasks for describing differences between text distributions.

https://arxiv.org/pdf/2302.14233.pdf

License:MIT License


Languages

Language:Python 99.9%Language:Shell 0.1%