tuetschek / e2e-eval

E2E NLG Challenge – outputs of participating systems & raw human ratings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

E2E NLG Challenge Evaluation Files

This package contains outputs of participating systems and raw human ratings for the E2E NLG Challenge. The systems have been trained and tested on the E2E NLG Dataset to generate restaurant recommendation from a flat meaning representation (attribute-value sets).

See the Challenge website and the detailed results draft paper on arXiv for more information.

Contents

Directory structure

The directory structure is the following:

  • participants.tsv -- "legend" to labels for participating teams (these are used for direcory names and in system labels in the human rating files; see below for primary system variants)
  • human_ratings/ -- raw human ratings
  • system_outputs/primary/ -- outputs of primary systems (these were used for human ratings)
  • system_outputs/all/ -- outputs of all systems submitted to the challenge (including primary)

System output files

The system output files are TSV files with UTF-8 encoding, containing two tab-separated columns:

  • MR = the input MR
  • output = the corresponding system output

Raw human ratings files

In the human ratings CSV files, these columns are of interest:

  • mr = the input MR
  • sys1-sys5 = system labels
  • ref1-ref5 = corresponding system outputs
  • quality1-quality5 / natur1-natur5 = corresponding human ratings

Primary system labels

These are used in the human rating files (they correspond to the labeling used in the paper):

label affiliation architecture
adapt AdaptCentre seq2seq
chen Harbin Institute of Technology seq2seq
dangnt University of Information Technology VNU-HCM rule-based
forge1 Pompeu Fabra University rule-based
forge3 Pompeu Fabra University templates
gong Harbin Institute of Technology seq2seq
harv HarvardNLP seq2seq
nle NAVER Labs Europe seq2seq
sheff1 Sheffield NLP data-driven
sheff2 Sheffield NLP seq2seq
slug UC Santa Cruz (Slug2Slug) seq2seq
slug-alt UC Santa Cruz (Slug2Slug) seq2seq
tgen Heriot-Watt University (baseline) seq2seq
tnt1 UC Santa Cruz (TNT-NLG) seq2seq
tnt2 UC Santa Cruz (TNT-NLG) seq2seq
tr1 Thomson Reuters NLP seq2seq
tr2 Thomson Reuters NLP templates
tuda UKP TU Darmstadt templates
zhang Xiamen University seq2seq
zhaw1 Zürcher Hochschule für Angewandte Wissenschaften data-driven
zhaw2 Zürcher Hochschule für Angewandte Wissenschaften data-driven

Contact & Citing

Ondřej Dušek, Jekaterina Novikova & Verena Rieser, Heriot-Watt University

To cite this data, please refer to this paper:

@article{dusek_evaluating_2019,
	title = {Evaluating the {State}-of-the-{Art} of {End}-to-{End} {Natural} {Language} {Generation}: {The} {E}2E {NLG} {Challenge}},
	journal = {arXiv:1901.07931 [cs]},
	author = {Dušek, Ondřej and Novikova, Jekaterina and Rieser, Verena},
	month = jan,
	year = {2019},
	url = {http://arxiv.org/abs/1901.07931},

About

E2E NLG Challenge – outputs of participating systems & raw human ratings