paragraph_implicit_discourse_relations

Code for the NAACL 2018 paper "Improving Implicit Discourse Relation Classification by Modeling Inter-dependencies of Discourse Units in a Paragraph"

@inproceedings{dai2018improving,
  title={Improving Implicit Discourse Relation Classification by Modeling Inter-dependencies of Discourse Units in a Paragraph},
  author={Dai, Zeyu and Huang, Ruihong},
  booktitle={Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
  volume={1},
  pages={141--151},
  year={2018}
}

To run the code:

Download preprocessed pdtb v2.0 data in .pt format (All the Words/POS/NER/label(both implict&explicit) and discourse unit (DU) boundary information are already transformed to Pytorch vector format) and put it in folder ./data/
For the model without CRF, run python run_discourse_parsing.py
For the model with CRF, run python run_CRF_discourse_parsing.py
For binary classification, run python run_binary_target_discourse_parsing.py
You can change the hyperparameters in .py file before the main() function (I am sorry that I didn't write code for config). Feel free to contact me if you need pretrained model file.

About preprocessing:

Download both Google word2vec and preprocessed POS/NER file (You can also generate them by yourself by downloading Standford CoreNLP toolkit and put them in ./data/resource)
The PDTB v2.0 dataset raw files are already in the ./data/preprocess/dataset/
run python pdtb_preprocess_moreexpimp_paragraph.py

Package version:
python == 2.7.10
torch == 0.3.0
nltk >= 3.2.2
gensim >= 0.13.2
numpy >= 1.13.1

About

Code for the NAACL 2018 paper "Improving Implicit Discourse Relation Classification by Modeling Inter-dependencies of Discourse Units in a Paragraph"

Languages

Language:Python 80.8%Language:HTML 12.7%Language:Perl 6.5%