xsthunder / transition-amr-parser

Transition-based parser for Abstract Meaning Representation (AMR) in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Transition-based AMR Parser

Transition-based parser for Abstract Meaning Representation (AMR) in Pytorch version 0.4.2. Current code implements the Action-Pointer Transformer model (Zhou et al 2021) from NAACL2021. The model yields 81.8 Smatch (83.4 with silver data and partial ensemble) on AMR2.0 test. Due to aligner implementation improvements this code reaches 82.1 on AMR2.0 test.

Checkout the stack-transformer branch for the stack-Transformer model (Fernandez Astudillo et al 2020) from EMNLP findings 2020. This yields 80.2 Smatch (81.3 with self-learning) on AMR2.0 test (this code reaches 80.5 due to the aligner implementation). Stack-Transformer can be used to reproduce our works on self-learning and cycle consistency in AMR parsing (Lee et al 2020) from EMNLP findings 2020, alignment-based multi-lingual AMR parsing (Sheth et al 2021) from EACL 2021 and Knowledge Base Question Answering (Kapanipathi et al 2021) from ACL findings 2021.

The code also contains an implementation of the AMR aligner from (Naseem et al 2019) with the forced-alignment introduced in (Fernandez Astudillo et al 2020).

Aside from listed contributors, the initial commit was developed by Miguel Ballesteros and Austin Blodgett while at IBM.

IBM Internal Features

IBM-ers please look here for available parsing web-services, CCC installers/trainers, trained models, etc.

Installation

Just clone and pip install (see set_environment.sh below if you use a virtualenv)

git clone git@github.ibm.com:mnlp/transition-amr-parser.git
cd transition-amr-parser
pip install .  # use --editable if you plan to modify code

We use a set_environment.sh script inside of which we activate conda/pyenv and virtual environments, it can contain for example

[ ! -d venv ] && virtualenv venv
. venv/bin/activate

You can leave this empty if you don't want to use it

touch set_environment.sh

train and test scripts always source this script i.e.

. set_environment.sh

that will spare you activating the environments or setting up system variables and other each time, which helps when working with computer clusters.

To test if install worked

bash tests/correctly_installed.sh

To do a mini-test with 25 annotated sentences that we provide. This should take 1-3 minutes. It wont learn anything but at least will run all stages.

bash tests/minimal_test.sh

If you want to align AMR data, the aligner uses additional tools that can be donwloaded and installed with

bash preprocess/install_alignment_tools.sh

See here for more install details

Training a model

You first need to pre-process and align the data. For AMR2.0 do

. set_environment.sh
python preprocess/merge_files.py /path/to/LDC2017T10/data/amrs/split/ DATA/AMR2.0/corpora/

You will also need to unzip the precomputed BLINK cache

unzip /path/to/linkcache.zip

To launch train/test use (this will also run the aligner)

bash run/run_experiment.sh configs/amr2.0-action-pointer.sh

you can check training status with

python run/status.py --config configs/amr2.0-action-pointer.sh

use --results to check for scores once models are finished.

Decode with Pre-trained model

To use from the command line with a trained model do

amr-parse -c $in_checkpoint -i $input_file -o file.amr

It will parse each line of $input_file separately (assumed tokenized). $in_checkpoint is the Pytorch checkpoint of a trained model. The file.amr will contain the PENMAN notation AMR with additional alignment information as comments. Use the flag --service together with -c for an iterative parsing mode.

To use from other Python code with a trained model do

from transition_amr_parser.parse import AMRParser
parser = AMRParser.from_checkpoint(in_checkpoint)
annotations = parser.parse_sentences([['The', 'boy', 'travels'], ['He', 'visits', 'places']])
# Penman notation
print(''.join(annotations[0][0]))

About

Transition-based parser for Abstract Meaning Representation (AMR) in Pytorch

License:Apache License 2.0


Languages

Language:Python 94.0%Language:Shell 5.8%Language:Dockerfile 0.2%