- RNN_RNN
- CNN_RNN
- Hierarchical Attention Networks
Requires pipenv. Use pip install pipenv
if not installed.
pipenv install
pipenv shell
# train
python main.py -device 0 -batch_size 32 -model RNN_RNN -seed 1 -save_dir checkpoints/XXX.pt
# test
python main.py -device 0 -batch_size 1 -test -load_dir checkpoints/XXX.pt
For example, you have some documents some/folder/*/article.txt
. The first step is to tokenize them and pack them into a JSON file. To do this, run:
python make_data.py "some/folder/*/article.txt" data/my_collection.json
When running main.py
, you can use the new option -num_tok
to control the exact number of words per each output summary, or the existing -topk
option (select k sentences).
If you're running PyTorch without CUDA, execute git apply no_cuda.patch
first.
To untokenize the output summaries, use
python put_back_summaries.py outputs/hyp/ "some/folder/*/"
to send the untokenized summaries to the same directories as their originals.
- RNN_RNN(
checkpoints/RNN_RNN_seed_1.pt
) - CNN_RNN(
checkpoints/CNN_RNN_seed_1.pt
) - AttnRNN(
checkpoints/AttnRNN_seed_1.pt
)
model | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
SummaRNNer(Nallapati) | 26.2 | 10.8 | 14.4 |
RNN-RNN | 26.0 | 11.5 | 13.8 |
CNN-RNN | 25.8 | 11.3 | 13.8 |
Hierarchical Attn Net | 26.0 | 11.4 | 13.8 |
-
百度云:https://pan.baidu.com/s/1LV3iuuH1NjxuAJd0iz14lA 密码:
ivzl
-
Google Driver:data.tar.gz
-
Source Data:Neural Summarization by Extracting Sentences and Words
- Thanks for @AlJohri's contribution