bloomberg / entsum

Open Source / ENTSUM: A Data Set for Entity-Centric Extractive Summarization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

EntSUM: A dataset for entity centric summarization

Repository for pre-processing code related to generating the training datasets used in the paper.

Using this repository

The repository contains 4 notebooks:

Datasets

CNN/DailyMail and NYT are datasets that can be used for training models by setting up entity-centric summarization datasets with methods described in the paper and by leveraging the notebooks mentioned above.

The EntSUM dataset is used to evaluate the effectiveness of these trained entity-centric summarization models.

License

The EntSUM code is distributed under the Apache License (version 2.0); see the LICENSE file at the top of the source tree for more information.

Note: To run the code and download the datasets, please obtain the respective licenses for each respectively.

Citation

@inproceedings{maddela-etal-2022-entsum,
    title = "{E}nt{SUM}: A Data Set for Entity-Centric Extractive Summarization",
    author = "Maddela, Mounica  and
      Kulkarni, Mayank  and
      Preotiuc-Pietro, Daniel",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.237",
    pages = "3355--3366",
    abstract = "Controllable summarization aims to provide summaries that take into account user-specified aspects and preferences to better assist them with their information need, as opposed to the standard summarization setup which build a single generic summary of a document.We introduce a human-annotated data set EntSUM for controllable summarization with a focus on named entities as the aspects to control.We conduct an extensive quantitative analysis to motivate the task of entity-centric summarization and show that existing methods for controllable summarization fail to generate entity-centric summaries. We propose extensions to state-of-the-art summarization approaches that achieve substantially better results on our data set. Our analysis and results show the challenging nature of this task and of the proposed data set.",
}

About

Open Source / ENTSUM: A Data Set for Entity-Centric Extractive Summarization

License:Apache License 2.0


Languages

Language:Jupyter Notebook 100.0%