iornn-depparse

Update

MSTParser now can producing top-K depparses with 1st order

A Lua implementation of the reranking-based dependency parser using inside-outside recursive neural networks described in

[1] Phong Le and Willem Zuidema (2014). The Inside-Outside Recursive Neural Network model for Dependency Parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).

Written and maintained by Phong Le (p.le [at] uva.nl)

###Package This package contains three components

source/ - source code files in Lua of the IORNN reranker,
tools/mstparser/ - the MSTParser 0.5.1 which the option of generating k-best candidates is "unlocked",
data/wsj-dep/universal/dic/ - the word list, POS list, dependent relation list and Collobert & Weston word embeddings for our experiments on the WSJ-U.

###Installation

Install Torch7.

Compile the MSTParser following tools/mstparser/README.

###Usage

The following instruction is for replicating the results on the WSJ-U reported in the paper. Some small changes are needed for your own cases.

####Data

Convert the WSJ to dependencies using the Universal Dependency Treebank Tool.

Split the treebank into train, test, dev portions, store them in train.conll, test.conll, dev.conll in the folder data/wsj-dep/universal/data/.

####Generate k-best lists Execute

 cd tools/mstparser/
 ./train.sh #train the MSTParser on train.conll (you may need to change paths)
 ./kbest.sh #generate k-best lists for dev.conll and test.conll (you may need to change paths and K)
 cp experiment/dev-10-best-mst2ndorder.* experiment/test-10-best-mst2ndorder.* ../../data/wsj-dep/universal/data/

####Train the reranker Open dp_spec.lua, which stores the default parameter values and file names. Set

K = 10  
alpha = 0   
K_range = nil
alpha_range = nil

Execute

cd source/
mkdir your_model_dir   
nohup th train_depparser_rerank.lua ../data/wsj-dep/universal/dic ../data/wsj-dep/universal/data collobert your_model_dir 200 >& log_train &

A trained model after each epoch is stored in your_model_dir.

200 is the net dimensions in our experiments.

All intermediate outputs are written down in log_train.

After each epoch, dev.conll is evaluated with UAS and LAS metrics. Pick the model that achieves the highest UAS (model_29 in our experiments).

####Optimise K and alpha

Open source/dp_spec.lua, set K_range = {1,10}, alpha_range = {0,1}.

Execute

th eval_depparser_rerank.lua your_model_path ../data/wsj-dep/universal/data/dev.conll ../data/wsj-dep/universal/data/dev-10-best-mst2ndorder.conll your_output

Pick K and alpha that achieve the highest UAS (9 and 0.68 in our experiments).

####Evaluation

Open dp_spec.lua, set K and alpha with the found values, set K_range = nil, alpha_range = nil.

Execute

th eval_depparser_rerank.lua your_model_path ../data/wsj-dep/universal/data/test.conll ../data/wsj-dep/universal/data/test-10-best-mst2ndorder.conll your_output

You should get a UAS around 93.08%.

XingxingZhang / iornn-depparse

iornn-depparse

Update

About

Languages