LeonardoEmili / tamr

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tamr

A transition-based AMR parser along with an aligner tuned by the parser. Used in our EMNLP 2018 paper An AMR Aligner Tuned by Transition-based Parser.

Notion

In the following sections, we will use the following notions:

  • ${TAMR_HOME}: the root directory of the project
  • ${TAMR_ALIGNER}: the directory of the AMR aligner, which equals to ${TAMR_HOME}/amr_aligner
  • ${TAMR_PARSER}: the directory of the transition-based aligner, which equals to ${TAMR_HOME}/amr_parser

Aligner

The code for AMR aligner is under ${TAMR_ALIGNER}.

Pre-requisites

Prepare resource

We use word2vec for semantic matching. See the README.md for more information about filtering wordvec.

Prepare data

Our alignment is built on the JAMR alignment results. You can get the input data with the following commends:

pushd "$JAMR_HOME" > /dev/null
. scripts/config.sh
scripts/ALIGN.sh < /path/to/your/input/data > /path/to/your/baseline/data

Run the Aligner

Go into ${TAMR_ALIGNER} and run the following commands:

python rule_base_align.py \
    -verbose \
    -data \
    /path/to/your/baseline/data \
    -output \
    /path/to/your/alignment/data \
    -wordvec \
    /path/to/your/wordvec/data \
    -trials \
    10000 \
    -improve_perfect \
    -morpho_match \
    -semantic_match

The quality of an alignment is evaluated by the smatch score of the graph it leads to. Here using -improve_perfect will update the alignment even with the baseline alignment achieve an smatch score of 1.0.

The output alignment is shown as blocks of results in the following format:

id
# ::alignment:

[2018/12/20 update] old replace_comments.py does not update the alignment in # ::node fields which was used in eager_oracle.py. Please use the refresh_alignments.py script to generate new alignment data. Thanks @jcyk for bug shooting!

After getting the alignment, use the following commands to generate new alignment:

python refresh_alignments.py \
    -lexicon \
    /path/to/your/alignment/data \
    -data \
    /path/to/your/baseline/data \
    > /path/to/your/new/alignment/data

You can also use refresh_alignments.py to yield aligned AMR file for LDC2014T12 with the alignment we release.

Parser

Pre-requisites

  • cmake
  • c++ supporting c++11
  • eigen

Build

Before compiling, you need to fetch the dynet and dynet_layer with

git submodule init
git submodule update

under ${TAMR_HOME}.

After fetching the submodules, run the following commends.

cd amr_parser
mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=/path/to/your/eigen/
make

The compilation will generate an executable under ${TAMR_PARSER}/bin/.

Prepare data

After getting your data with alignment, do run the ${TAMR_ALIGNER}/eager_oracle.py to generate training action file for the alignment as

python eager_oracle.py \
    -mod \
    dump \
    -aligned \
    /path/to/your/new/alignment/data \
    > /path/to/your/actions

Training the Parser

With the following commands under $TAMR_PARSER:

./amr_parser/bin/parser_l2r \
    --dynet-seed \
    1 \
    --train \
    --training_data \
    /path/to/your/new/actions/training/data \
    --devel_data \
    /path/to/your/new/actions/dev/data \
    --test_data \
    /path/to/your/new/actions/test/data \
    --pretrained \
    /path/to/your/embedding/file \
    --model \
    data/little_prince/model \
    --optimizer_enable_eta_decay \
    true \
    --optimizer_enable_clipping \
    true \
    --external_eval \
    ./amr_parser/scripts/eval_eager.sh \
    --devel_gold \
    /path/to/your/new/alignment/dev/data \
    --test_gold \
    /path/to/your/new/alignment/test/data \
    --max_iter \
    1

Released Alignments

You can find our alignment for LDC2014T12 under ${TAMR_HOME}/release/ldc2014t12. Since JAMR and CAMR use different tokenization, our release includes the alignment processed with cdec tokenization and stanford tokenization.

You can find our alignment for LDC2014T12 under ${TAMR_HOME}/release/ldc2017t10. Our release only contains alignment processed with cdec tokenization.

Pipeline Script

We demonstrate the process in the pipeline.sh script.

Awesome AMR

Our alignment helps other AMR parser to achieve better performance. We show how to hack into several open-source AMR parser and replace their alignment with ours in the awesome.md.

Contact

Yijia Liu <yjliu@ir.hit.edu.cn>

About


Languages

Language:Python 52.2%Language:C++ 38.2%Language:Cython 5.3%Language:CMake 3.2%Language:Shell 0.9%Language:C 0.1%