Dataset and Codes for our EMNLP 2022 Main Conference Long Paper titled "ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts"
The ECTSum dataset can be found under the data folder.
Codes
Codes and instructions for our proposed model ECT-BPS can be found under codes/ECT-BPS
Codes and instructions for our baseline models can be found under codes/baselines
Data Preparation for ECT-BPS
Preparing the data for training the Extractive Module
The data is saved at codes/ECT-BPS/ectbps_para/data/para/.
Processed data is already uploaded at this location.
Prepare the data with numericals masked
python prepare_data_ectbps_para_mask.py
Data Location
The data is saved at codes/ECT-BPS/ectbps_para/data/para_mask/.
Processed data is already uploaded at this location.
Updates
1st November 2022 - ECTSum Dataset released
30th November 2022 - Codes and Instructions released for training the Extractive Module of ECT-BPS
3rd March 2023 - Added the Prediction Pipeline for the Extractive module.
5th March 2023 - Codes released to prepare the data for training the Paraphrasing Module
7th March 2023 - Codes released to train the Paraphrasing Module of ECT-BPS
8th March 2023 - Google Colab Notebook released for training and testing the Paraphrasing Module
About
Dataset and Codes for our EMNLP 2022 Main Conference Long Paper titled "ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts"