yuqianghan / Multi-News

Large-scale multi-document summarization dataset and code

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-News

Data and code for the ACL 2019 paper Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model.

Data

Preprocessed, but not truncated, data
Preprocessed, truncated, data
Raw data (only replaced \n with "NEWLINE_CHAR" and appended "|||||" to the end of each story).
Raw data, bad retrievals removed -- Removes documents retrieved with error noticed in this issue and removes the "|||||" at the end of each example.
Raw data -- zipped
Tensorflow datasets

Models and Summaries

Trained models
Model output

About

Large-scale multi-document summarization dataset and code

License:Other


Languages

Language:Python 89.4%Language:Shell 4.9%Language:Perl 3.3%Language:Emacs Lisp 1.6%Language:Jupyter Notebook 0.2%Language:Smalltalk 0.2%Language:Ruby 0.2%Language:NewLisp 0.1%Language:JavaScript 0.1%Language:Slash 0.0%Language:SystemVerilog 0.0%