Benchmarking Large Language Models for News Summarization

This repository contains the data release for the paper Benchmarking Large Language Models for News Summarization.

Likert evaluation data

likert_evaluation_results.jsonl contains the results of the Likert evaluation in Table 2. The file should be loaded as a single JSON file and is a List[Dict]. Each dictionary contains the following keys:

model: the model name
article: the text to the article
summary: model output summary
dataset: the dataset name, an option between "cnndm" or "xsum"
faithfulness: the faithfulness score given by the annotator. The score is binary (0 or 1).
coherence: the coherence score given by the annotator. The score is from 1 to 5.
relevance: the relevance score given by the annotator. The score is from 1 to 5.
annotation_id: the annotation id.

Pairwise evaluation data

pairwise_evaluation_results.jsonl contains the results of the pairwise evaluation in Figure 5. The file should be loaded as a single JSON file and is a List[Dict]. Each dictionary contains the following keys:

article_id: unique identifier for the article
writer_id: unique identifier for the writer of the writer summary
evaluator_id: unique identifier for the evaluator for the pairwise comparison
article_text: the text to the article
writer_summary: the summary written by the writer
text-davinci-002_summary: the summary generated by the model text-davinci-002
overall_writer_better: whether the writer summary is better than the model summary. The score is an option among True, False, or Equally Good.
informative_writer_better: whether the writer summary is better than the model summary in terms of informativeness. The score is an option among True, False, or Equally Good.

All freelance writer summaries

Because we did not evaluate all summaries written by the freelance writers, we release a separate file with all the summaries. writer_summaries.jsonl is a List[Dict] and contains the following keys:

article_id: unique identifier for the article
article: the text to the article
summary: the summary written by the freelance writer

Authors and citation

This work is done by:

If you find this data useful, please cite the following paper:

@misc{https://doi.org/10.48550/arxiv.2301.13848,
  url = {https://arxiv.org/abs/2301.13848},
  author = {Zhang, Tianyi and Ladhak, Faisal and Durmus, Esin and Liang, Percy and McKeown, Kathleen and Hashimoto, Tatsunori B.},
  title = {Benchmarking Large Language Models for News Summarization},
  publisher = {arXiv},
  year = {2023},
}

Tiiiger / benchmark_llm_summarization

Benchmarking Large Language Models for News Summarization

Likert evaluation data

Pairwise evaluation data

All freelance writer summaries

Authors and citation

About