Datasets created in the paper Improving Truthfulness of Headline Generation.
We put datasets of annotation results we conducted against a part of Gigaword dataset in gigaword directory.
giga_entail_annotation.tsv
includes 1,000 records of the annotatiton and id of a gigaword article.
giga_entail_annotation_filtered.tsv
is a subset of giga_entail_annotation.tsv
.
This subset is created by the same filtering procedure as Rush et al., 2015.
We used this version to report the entailment ratio of Gigaword dataset in Section 3.2 of our paper.
The meaning of each column is:
Header | Description |
---|---|
id | The id of articles in the original English Gigaword dataset (Graff and Cieri, 2003; Napoles et al., 2012) |
lead1_worker{1-3} | The result of worker {1-3} determining whether the first sentence of the article entails its headline. 1 is entailment, 2 is non-entailment, and 3 is incomprehensible. |
full_worker{1-3} | Same as lead1_worker but full article is used instead of lead-1. |
lead1_result | Majority vote among the results of the annotation. If every worker has different annotations, the result is 0. |
full_result | Same as lead1_result |