SanaJahan / PreSumm-AMICorpus-DialSum

BertSum model fine-tuned with AMI DialSum Corpus (Baseline)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About

Disclaimer

The PreSumm model, presented in the EMNLP 2019 paper titled "Text Summarization with Pretrained Encoders" [original code], is not my work. Please credit the appropriate authors for that model.

Purpose of this repository

Contents

RequirementsHow to UseHow to Cite

Requirements

Python 3.5.2, PyRouge [notes]

pip install -r requirements.txt

How to Use

  1. First run: For the first time, you should use single-GPU, so the code can download the BERT model. Use -visible_gpus -1, after downloading, you could kill the process and rerun the code with multi-GPUs.

  2. Download best performing model with PreSumm: CNN/DM BertExtAbs

A. Evaluate on untrained BertSumExtAbs

Modify script with directory where BertSumExtAbs weights are saved and run:

./src/load_custom_data_an_eval.sh

B. Fine-tune BertSumExtAbs For AMI DialSum Meeting Corpus

B.1. Download CoreNLP and export:

export CLASSPATH=./stanford-corenlp-full-2018-10-05/stanford-corenlp-3.9.2.jar

B.2. Prepare dataset

B.3. Fine-tune model with AMI DIalSum dataset (modified settings such as train_steps, lrbert, lrdec, warmup*, ...)

./src/fine_tuning.sh

B.4. Evaluate

./src/eval.sh

Acknowledgement

About

BertSum model fine-tuned with AMI DialSum Corpus (Baseline)

License:MIT License


Languages

Language:Python 97.5%Language:Shell 2.5%