jbrry / Enhanced-UD-Parsing

Repository for the ADAPT submission to the 2020 IWPT Shared Task on Enhanced Universal Dependency Parsing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Enhanced UD Parsing

Repository for The ADAPT Enhanced Dependency Parser at the IWPT 2020 Shared Task.

Team ADAPT at the IWPT 2020 Shared Task.

Semantic Parser Installation

This project contains the modules tagging and tagging_stable which use the master branch of AllenNLP and AllenNLP 0.9.0 respectively.

If using a Conda environment, create a Conda environment with Python 3.7:

# stable
conda create -n enhanced_parsing_stable python=3.7

# dev
conda create -n enhanced_parsing_dev python=3.7

Activate the Conda environment and install the dependencies:

conda activate enhanced_parsing_stable

# or for dev
conda activate enhanced_parsing_dev
pip install -r requirements_stable.txt

# or for dev
pip install -r requirements_dev.txt

If using venv, to create environments with stable and bleeding edge:

Here we are testing with Python 3.6.

Virtual environment for stable 0.9.0

mkdir venv
cd venv
virtualenv -p /usr/bin/python3.6 allennlp-090

If needed, edit allennlp-090/bin/activate to configure your CUDA environment, e.g.

# Manually added configuration
LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:"$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH

Activate and install packages:

source allennlp-090/bin/activate
pip install torch torchvision
pip install cython
pip install allennlp
pip install -U conllu
exit

Virtual environment for bleeding edge

mkdir venv
cd venv
virtualenv -p /usr/bin/python3.6 allennlp-dev

If needed, edit allennlp-dev/bin/activate to configure your CUDA environment, e.g.

# Manually added configuration
LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:"$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH

Activate and install packages:

source allennlp-dev/bin/activate
pip install torch torchvision
pip install cython
pip install allennlp==1.0.0.dev20200418
pip install -U conllu
exit

Obtain data

You will need to obtain the official shared task training and development data:

cd data
wget http://ufal.mff.cuni.cz/~zeman/soubory/iwpt2020-train-dev.tgz
tar -xvzf iwpt2020-train-dev.tgz

Train semantic parser models

You can use train.sh with the following arguments, where ${task_type} is either basic or enhanced, ${model_type} is either dm or kg depending on the type of parser being used, ${tbid} is the treebank id, e.g. en_ewt, ${random_seed} is the random seed e.g. 12345 and ${package} is the package version, either tagging_stable or tagging.

./scripts/train.sh ${task_type} ${model_type} ${tbid} ${random_seed} ${package}

Please see reproducibility/iwpt-2020 for more details about the models used for our submission to the IWPT 2020 Shared Task.

Basic Parser

Citing

If you wish to cite this paper or if you use the software in your research please use the reference below:

@InProceedings{barry-wagner-foster:2020:iwpt,
  author    = {Barry, James  and  Wagner, Joachim  and  Foster, Jennifer},
  title     = {The {ADAPT} Enhanced Dependency Parser at the {IWPT} 2020 Shared Task},
  booktitle      = {Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies},
  month          = jul,
  year           = {2020},
  address        = {Online},
  publisher      = {Association for Computational Linguistics},
  pages     = {227--235},
  abstract  = {We describe the ADAPT system for the 2020 IWPT Shared Task on parsing enhanced Universal Dependencies in 17 languages. We implement a pipeline approach using UDPipe and UDPipe-future to provide initial levels of annotation. The enhanced dependency graph is either produced by a graph-based semantic dependency parser or is built from the basic tree using a small set of heuristics. Our results show that, for the majority of languages, a semantic dependency parser can be successfully applied to the task of parsing enhanced dependencies. Unfortunately, we did not ensure a connected graph as part of our pipeline approach and our competition submission relied on a last-minute fix to pass the validation script which harmed our official evaluation scores significantly. Our submission ranked eighth in the official evaluation with a macro-averaged coarse ELAS F1 of 67.23 and a treebank average of 67.49. We later implemented our own graph-connecting fix which resulted in a score of 79.53 (language average) or 79.76 (treebank average), which would have placed fourth in the competition evaluation.},
  url       = {https://www.aclweb.org/anthology/2020.iwpt-1.24}
}

About

Repository for the ADAPT submission to the 2020 IWPT Shared Task on Enhanced Universal Dependency Parsing

License:Other


Languages

Language:Python 70.1%Language:Perl 14.7%Language:Jsonnet 8.4%Language:Shell 6.8%