Tiiiger / benchmark_llm_summarization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Benchmarking Large Language Models for News Summarization

This repository contains the data release for the paper Benchmarking Large Language Models for News Summarization.

Likert evaluation data

likert_evaluation_results.jsonl contains the results of the Likert evaluation in Table 2. The file should be loaded as a single JSON file and is a List[Dict]. Each dictionary contains the following keys:

  • model: the model name
  • article: the text to the article
  • summary: model output summary
  • dataset: the dataset name, an option between "cnndm" or "xsum"
  • faithfulness: the faithfulness score given by the annotator. The score is binary (0 or 1).
  • coherence: the coherence score given by the annotator. The score is from 1 to 5.
  • relevance: the relevance score given by the annotator. The score is from 1 to 5.
  • annotation_id: the annotation id.

Pairwise evaluation data

pairwise_evaluation_results.jsonl contains the results of the pairwise evaluation in Figure 5. The file should be loaded as a single JSON file and is a List[Dict]. Each dictionary contains the following keys:

  • article_id: unique identifier for the article
  • writer_id: unique identifier for the writer of the writer summary
  • evaluator_id: unique identifier for the evaluator for the pairwise comparison
  • article_text: the text to the article
  • writer_summary: the summary written by the writer
  • text-davinci-002_summary: the summary generated by the model text-davinci-002
  • overall_writer_better: whether the writer summary is better than the model summary. The score is an option among True, False, or Equally Good.
  • informative_writer_better: whether the writer summary is better than the model summary in terms of informativeness. The score is an option among True, False, or Equally Good.

All freelance writer summaries

Because we did not evaluate all summaries written by the freelance writers, we release a separate file with all the summaries. writer_summaries.jsonl is a List[Dict] and contains the following keys:

  • article_id: unique identifier for the article
  • article: the text to the article
  • summary: the summary written by the freelance writer

Authors and citation

This work is done by:

If you find this data useful, please cite the following paper:

@misc{https://doi.org/10.48550/arxiv.2301.13848,
  url = {https://arxiv.org/abs/2301.13848},
  author = {Zhang, Tianyi and Ladhak, Faisal and Durmus, Esin and Liang, Percy and McKeown, Kathleen and Hashimoto, Tatsunori B.},
  title = {Benchmarking Large Language Models for News Summarization},
  publisher = {arXiv},
  year = {2023},
}

About