Text summarization starting from scratch.
This repository will keep updating...
Table of Contents
- Basic Concept
- Sentence Summarization
- Unsupervised Abstractive Summarization
- Multi Document Summarization
- Evaluation Metrics
- Other Resources
Summarization is the task of producing a shorter version of one or several documents that preserves most of the input's meaning.
Types of summarization
Extractive summaries (extracts) are produced by concatenating several sentences taken exactly as they appear in the materials being summarized.
Abstractive summaries (abstracts), are written to convey the main information in the input and may reuse phrases or clauses from it, but the summaries are overall expressed in the words of the summary author.
Summary Informativeness evaluation
- ROUGE-N: measures the N-gram units common between a particular summary and a col- lection of reference summaries where N determines the N-gram’s length. E.g., ROUGE-1 for unigrams and ROUGE-2 for bi-grams.
- ROUGE-L: computes Longest Common Subsequence (LCS) metric.
- BLUE : BLEU is basically calculated on the n-gram co-occerance between the generated summary and the gold (You don't need to specify the "n" unlike ROUGE).
- METEOR : based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision.
- for sentence summarization
- for document summatization
- is a large dataset for training and evaluating summarization systems. It contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications. The summaries are obtained from search and social metadata between 1998 and 2017 and use a variety of summarization strategies combining extraction and abstraction.
Large corpus of uncompressed and compressed sentences from news articles.
Abstractive Document summarization
1.words-lvt2k-temp-att (Nallapti et al., 2016) : Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
2.Graph-Based Attn : Abstractive Document Summarization with a Graph-Based Attentional Neural Model
3.Pointer-generator + coverage (See et al., 2017) : Get To The Point: Summarization with Pointer-Generator Networks
4.KIGN+Prediction-guide : Guiding Generation for Abstractive Text Summarization based on Key Information Guide Network
5.Explicit Info Selection Modeling(Li et al., 2018a) : Improving Neural Abstractive Document Summarization with Explicit Information Selection Modeling
6.Structural Regularization(Li et al., 2018b) : Improving Neural Abstractive Document Summarization with Structural Regularization
7.end2end w/ inconsistency loss (Hsu et al., 2018): A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss
8.Pointer + Coverage + EntailmentGen + QuestionGen (Guo et al., 2018) : Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation
Based Reinforcement Learning:
1.ML+RL ROUGE+Novel, with LM (Kryscinski et al., 2018) : Improving Abstraction in Text Summarization
2.RL + pg + cbdec (Jiang and Bansal, 2018): Closed-Book Training to Improve Summarization Encoder Memory
3.rnn-ext + abs + RL + rerank (Chen and Bansal, 2018): Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting
4.ML+RL, with intra-attention : A Deep Reinforced Model for Abstractive Summarization
5.ML+RL ROUGE+Novel, with LM : Improving Abstraction in Text Summarization
7.DCA (Celikyilmaz et al., 2018) : Summarization
8.ROUGESal+Ent RL (Pasunuru and Bansal, 2018): Multi-Reward Reinforced Summarization with Saliency and Entailment
Extractive Document summarization
1.TEXTRANK(graph based): TextRank: Bringing Order intoTexts
3.NN-SE : [Neural summarization by extracting sentences and words
5.NeuSUM (Zhou et al., 2018) : Neural Document Summarization by Jointly Learning to Score and Select Sentences
6.Latent (Zhang et al., 2018) : Neural Latent Extractive Document Summarization
Based Reinforcement Learning
1.rnn-ext + RL (Chen and Bansal, 2018): Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting
2.Bottom-Up Summarization (Gehrmann et al., 2018): Bottom-Up Abstractive Summarization
7.RNES w/o coherence :Learning to Extract Coherent Summary via Deep Reinforcement Learning
1.Re^3 Sum (Cao et al., 2018) : Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization
2.FTSum_g (Cao et al., 2018) : Faithful to the Original: Fact Aware Neural Abstractive Summarization
3.Seq2seq + E2T_cnn (Amplayo et al., 2018) : Abstractive Sentence Summarization with Attentive Recurrent Neural Networks
4.EndDec+WFE (Suzuki and Nagata, 2017) : Cutting-off Redundant Repeating Generations for Neural Abstractive Summarization
5.DRGD (Li et al., 2017) : Deep Recurrent Generative Decoder for Abstractive Text Summarization
6.BiRNN + LM Evaluator (Zhao et al. 2018) : A Language Model based Evaluator for Sentence Compression
Unsupervised Abstractive Summarization
2.Semantic Abstractive Sum based AMR(2018 Dohare): Unsupervised Semantic Abstractive Summarization
3.Paraphrastic Sentence Fusion Model(2018 Nayeem): Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion
Multi Document Summarization
1.(Z Cao 2017) : Improving Multi-Document Summarization via Text Classification
1.ROUGE(2004) : Rouge: A package for automatic evaluation of summaries
4.Pyramid Method(2007) : Evaluating Content Selection in Summarization: The Pyramid Method
6.(2018 Honda) : Pruning Basic Elements for Better Automatic Evaluation of Summaries
- The guide to tackle with the Text Summarization
- A curated list of resources dedicated to text summarization
SOTA in summarizaiton : The current state-of-the-art