SanaJahan/PreSumm-AMICorpus-DialSum

About

The PreSumm model, presented in the EMNLP 2019 paper titled "Text Summarization with Pretrained Encoders" [original code], is not my work. Please credit the appropriate authors for that model.

Need to use PreSumm as baseline model for comparison with a custom dataset.
Using the pre-trained model BertExtAbs, fine-tune PreSumm with the custom dataset.
Additional notes are available, including my code modifications in detail.

Python 3.5.2, PyRouge [notes]

pip install -r requirements.txt

First run: For the first time, you should use single-GPU, so the code can download the BERT model. Use -visible_gpus -1, after downloading, you could kill the process and rerun the code with multi-GPUs.
Download best performing model with PreSumm: CNN/DM BertExtAbs

Modify script with directory where BertSumExtAbs weights are saved and run:

./src/load_custom_data_an_eval.sh

B.1. Download CoreNLP and export:

export CLASSPATH=./stanford-corenlp-full-2018-10-05/stanford-corenlp-3.9.2.jar

B.2. Prepare dataset

B.3. Fine-tune model with AMI DIalSum dataset (modified settings such as train_steps, lrbert, lrdec, warmup*, ...)

./src/fine_tuning.sh

B.4. Evaluate

./src/eval.sh