nlp-titech / headline-entailment

Datasets created in the paper "Improving Truthfulness of Headline Generation"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

headline-entailment

Datasets created in the paper Improving Truthfulness of Headline Generation.

Gigaword Entailment Dataset

We put datasets of annotation results we conducted against a part of Gigaword dataset in gigaword directory.

giga_entail_annotation.tsv includes 1,000 records of the annotatiton and id of a gigaword article.

giga_entail_annotation_filtered.tsv is a subset of giga_entail_annotation.tsv. This subset is created by the same filtering procedure as Rush et al., 2015. We used this version to report the entailment ratio of Gigaword dataset in Section 3.2 of our paper.

The meaning of each column is:

Header Description
id The id of articles in the original English Gigaword dataset (Graff and Cieri, 2003; Napoles et al., 2012)
lead1_worker{1-3} The result of worker {1-3} determining whether the first sentence of the article entails its headline. 1 is entailment, 2 is non-entailment, and 3 is incomprehensible.
full_worker{1-3} Same as lead1_worker but full article is used instead of lead-1.
lead1_result Majority vote among the results of the annotation. If every worker has different annotations, the result is 0.
full_result Same as lead1_result

About

Datasets created in the paper "Improving Truthfulness of Headline Generation"