jackylee1993 / pdtb-parser

A PDTB-Styled End-to-End Discourse Parser

Home Page:http://wing.comp.nus.edu.sg/~linzihen/parser/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#A PDTB-Styled End-to-End Discourse Parser

##Demo

You can try out the parser at http://wing.comp.nus.edu.sg/~linzihen/parser/

##Required libraries

  • Ruby (1.8.7)

  • Rubygems

  • OpenNLP Tools (opennlp-tools-1.3.0)

    opennlp-tools-1.3.0.jar and jwnl-1.3.3.jar of the package needed

  • OpenNLP MaxEnt (maxent-2.5.2)

    maxent-2.5.2.jar and trove.jar of the package needed

  • Stanford Parser (stanford-parser-2010-08-20)

  • Morpha (http://www.informatics.susx.ac.uk/research/groups/nlp/carroll/morph.html)

    You need to use one of morpha.{ix86_linux|ppc_darwin|sun4_sunos} depending on your system

    Note: The OpenNLP tools, MaxEnt, Stanford Parser and Morpha libraries are included under lib/ in this distribution.

##Install

  • Install the required libraries

  • Change line 3 - 6 of src/variable.rb to point to your downloaded Stanford Parser, OpenNLP Tools, OpenNLP MaxEnt, and morpha.

##Running the parser

./parse.rb text-file

The input text is raw text in which paragraphs are separated by empty lines.

E.g., ./parse.rb ../test1

##Output format

###Explicit relations:

Example (1):

{Exp_2_conn_Concession Although Exp_2_conn} {Exp_2_Arg2 preliminary findings were reported more than a year ago , Exp_2_Arg2} {Exp_2_Arg1 the latest results appear in today 's New England Journal of Medicine , a forum likely to bring new attention to the problem . Exp_2_Arg1}

Explicit relations consist of 16 PDTB level-2 types: Asynchronous, Synchrony, Cause, Pragmatic_cause, Condition, Pragmatic_condition, Contrast, Pragmatic_contrast, Concession, Pragmatic_concession, Conjunction, Instantiation, Restatement, Alternative, Exception List

The connective string of an explicit relation is enclosed by {Exp_id_conn_type and Exp_id_conn}, where id and type are the id and relation type assigned to this explicit relation. Arg1 span is enclosed by {Exp_id_Arg1 and Exp_id_Arg1}, and Arg2 span is enclosed by {Exp_id_Arg2 and Exp_id_Arg2}.

###Non-Explicit relations:

Example (2):

{NonExp_4_Arg1 Neither Lorillard nor the researchers who studied the workers were aware of any research on smokers of the Kent cigarettes . NonExp_4_Arg1} {NonExp_4_Arg2_Instantiation `` We have no useful information on whether users are at risk , '' {Attr_2 said James A. Talcott of Boston 's Dana-Farber Cancer Institute . NonExp_4_Arg2} Attr_2}

All Implicit, AltLex, EntRel and NoRel relations defined in the PDTB are lumped into Non-Explicit relations. They consist of 13 types: EntRel, NoRel, plus 11 PDTB level-2 types (Asynchronous, Synchrony, Cause, Pragmatic_cause, Contrast, Concession, Conjunction, Instantiation, Restatement, Alternative, List).

Condition, Pragmatic_condition, Pragmatic_contrast, Pragmatic_concession, and Exception are not included due to lack of training instances.

Arg1 span is enclosed by {NonExp_id_Arg1 and NonExp_id_Arg1}, and Arg2 span is enclosed by {NonExp_id_Arg2 and NonExp_id_Arg2}.

###Attribution spans:

Attribution spans are enclosed by {Attr_id and Attr_id}. See example (2): {Attr_2 said James A. Talcott of Boston 's Dana-Farber Cancer Institute . Attr_2}

##How to cite this work

Ziheng Lin, Hwee Tou Ng and Min-Yen Kan (2014). A PDTB-Styled End-to-End Discourse Parser. Natural Language Engineering, 20, pp 151-184. Cambridge University Press.

##Copyright notice and statement of copying permission

Copyright 2010-2012 Ziheng Lin

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

About

A PDTB-Styled End-to-End Discourse Parser

http://wing.comp.nus.edu.sg/~linzihen/parser/

License:GNU General Public License v3.0