seanliu96 / paragraph-level_implicit_discourse_relation_classification

Code for the NAACL 2018 paper "Improving Implicit Discourse Relation Classification by Modeling Inter-dependencies of Discourse Units in a Paragraph"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

paragraph_implicit_discourse_relations

Code for the NAACL 2018 paper "Improving Implicit Discourse Relation Classification by Modeling Inter-dependencies of Discourse Units in a Paragraph"

@inproceedings{dai2018improving,
  title={Improving Implicit Discourse Relation Classification by Modeling Inter-dependencies of Discourse Units in a Paragraph},
  author={Dai, Zeyu and Huang, Ruihong},
  booktitle={Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
  volume={1},
  pages={141--151},
  year={2018}
}

To run the code:

  1. Download preprocessed pdtb v2.0 data in .pt format (All the Words/POS/NER/label(both implict&explicit) and discourse unit (DU) boundary information are already transformed to Pytorch vector format) and put it in folder ./data/
  2. For the model without CRF, run python run_discourse_parsing.py
  3. For the model with CRF, run python run_CRF_discourse_parsing.py
  4. For binary classification, run python run_binary_target_discourse_parsing.py
  5. You can change the hyperparameters in .py file before the main() function (I am sorry that I didn't write code for config). Feel free to contact me if you need pretrained model file.

About preprocessing:

  1. Download both Google word2vec and preprocessed POS/NER file (You can also generate them by yourself by downloading Standford CoreNLP toolkit and put them in ./data/resource)
  2. The PDTB v2.0 dataset raw files are already in the ./data/preprocess/dataset/
  3. run python pdtb_preprocess_moreexpimp_paragraph.py

Package version:
python == 2.7.10
torch == 0.3.0
nltk >= 3.2.2
gensim >= 0.13.2
numpy >= 1.13.1

About

Code for the NAACL 2018 paper "Improving Implicit Discourse Relation Classification by Modeling Inter-dependencies of Discourse Units in a Paragraph"


Languages

Language:Python 80.8%Language:HTML 12.7%Language:Perl 6.5%