Benchmarking Large Language Models for News Summarization
This repository contains the data release for the paper Benchmarking Large Language Models for News Summarization.
Likert evaluation data
likert_evaluation_results.jsonl
contains the results of the Likert evaluation in Table 2.
The file should be loaded as a single JSON file and is a List[Dict]
.
Each dictionary contains the following keys:
model
: the model namearticle
: the text to the articlesummary
: model output summarydataset
: the dataset name, an option between "cnndm" or "xsum"faithfulness
: the faithfulness score given by the annotator. The score is binary (0 or 1).coherence
: the coherence score given by the annotator. The score is from 1 to 5.relevance
: the relevance score given by the annotator. The score is from 1 to 5.annotation_id
: the annotation id.
Pairwise evaluation data
pairwise_evaluation_results.jsonl
contains the results of the pairwise evaluation in Figure 5.
The file should be loaded as a single JSON file and is a List[Dict]
.
Each dictionary contains the following keys:
article_id
: unique identifier for the articlewriter_id
: unique identifier for the writer of the writer summaryevaluator_id
: unique identifier for the evaluator for the pairwise comparisonarticle_text
: the text to the articlewriter_summary
: the summary written by the writertext-davinci-002_summary
: the summary generated by the modeltext-davinci-002
overall_writer_better
: whether the writer summary is better than the model summary. The score is an option amongTrue
,False
, orEqually Good
.informative_writer_better
: whether the writer summary is better than the model summary in terms of informativeness. The score is an option amongTrue
,False
, orEqually Good
.
All freelance writer summaries
Because we did not evaluate all summaries written by the freelance writers, we release a separate file with all the summaries.
writer_summaries.jsonl
is a List[Dict]
and contains the following keys:
article_id
: unique identifier for the articlearticle
: the text to the articlesummary
: the summary written by the freelance writer
Authors and citation
This work is done by:
If you find this data useful, please cite the following paper:
@misc{https://doi.org/10.48550/arxiv.2301.13848,
url = {https://arxiv.org/abs/2301.13848},
author = {Zhang, Tianyi and Ladhak, Faisal and Durmus, Esin and Liang, Percy and McKeown, Kathleen and Hashimoto, Tatsunori B.},
title = {Benchmarking Large Language Models for News Summarization},
publisher = {arXiv},
year = {2023},
}