jbrry / Second_Order_Parsing

[ACL 2019/AACL 2020] Second-Order Syntactic/Semantic Dependency Parsing With Mean Field Variational Inference (PyTorch)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Second-Order Syntactic/Semantic Dependency Parser

An implementation of our AACL 2020 paper "Second-Order Neural Dependency Parsing with Message Passing and End-to-End Training" and a new version of our ACL 2019 paper "Second-Order Semantic Dependency Parsing with End-to-End Neural Networks".

The code is based on the old version of SuPar

Comparing with original code, we use MST instead of Eisner for syntactic dependency parsing. Our code is also able to concatenate word, POS tags, char and BERT embeddings as token representations.

Requirements

Datasets

The model is evaluated on the Stanford Dependency conversion (v3.3.0) of the English Penn Treebank with POS tags predicted by Stanford POS tagger.

For all datasets, we follow the conventional data splits:

  • Train: 02-21 (39,832 sentences)
  • Dev: 22 (1,700 sentences)
  • Test: 23 (2,416 sentences)

Performance for Syntactic Dependency Parsing

MODEL UAS LAS Speed (Sents/s)
Single1O + TAG + MST 95.75 94.04 1123
Local1O + TAG + MST 95.83 94.23 1150
Single2O + TAG + MST 95.86 94.19 966
Local2O + TAG + MST 95.98 94.34 1006
Local2O + MST (Best) 96.12 94.47 1006
CRF2O (Best) (Zhang et al., 2020) 96.14 94.49 400

Where 1O represents first-order, 2O reperesents second-order, Single represents binary classification Local represents head-selection. The results are averaged over 5 times, Best represents the single test results based on best development performance. Punctuation is ignored in all evaluation metrics for PTB.

Usage

You can start the training, evaluation and prediction process by using subcommands registered in parser.cmds.

To train a syntactic parser, run:

$ CUDA_VISIBLE_DEVICES=0 python3 -u run.py train  --conf config/3iter_100binary_0init_ptb_full_tree_0.cfg

To train a semantic parser, you can modify the dataset split in the config file. Then set tree = False and binary = True. Moreover, based on the binary structure, you can train a Enhanced Universal Dependencies (EUD) parser as well. But for better training a EUD parser, please use MultilangStructureKD.

All the data files must follow the CoNLL-U format.

Other codes

  • Tensorflow version of semantic dependency parser: Second_Order_SDP.
  • Pytorch version of enhanced universal dependencies parser: MultilangStructureKD.
  • An application for Mean-Field Variational Inference to Sequence Labeling: AIN.
  • The PyTorch Version of Biaffine Parser: parser.

References

About

[ACL 2019/AACL 2020] Second-Order Syntactic/Semantic Dependency Parsing With Mean Field Variational Inference (PyTorch)

License:MIT License


Languages

Language:Python 100.0%