rafaelanchieta / AMR_AS_GRAPH_PREDICTION

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AMR AS GRAPH PREDICTION

This repository contains code for training and using the Abstract Meaning Representation model described in: AMR Parsing as Graph Prediction with Latent Alignment

If you use our code, please cite our paper as follows:

@inproceedings{Lyu2018AMRPA,
    title={AMR Parsing as Graph Prediction with Latent Alignment},
    author={Chunchuan Lyu and Ivan Titov},
    booktitle={Proceedings of the Annual Meeting of the Association for Computational Linguistics},
    year={2018}
}

Prerequisites:

Configuration:

  • Set up Stanford Corenlp server, which feature extraction relies on.
  • Change file paths in utility/constants.py accordingly.

Preprocessing:

Combine all *.txt files into a single one, and use stanford corenlp to extract ner, pos and lemma. Processed file saved in the same folder. python src/preprocessing.py or Process from AMR-to-English aligner using java script in AMR_FEATURE (I used eclipse to run it)

Build the copying dictionary and recategorization system (can skip as they are in data/). python src/rule_system_build.py Build data into tensor. python src/data_build.py

Training:

Default model is saved in [save_to]/gpus_0valid_best.pt . (save_to is defined in constants.py) python src/train.py

Testing

Load model to parse from pre-build data. python src/generate.py -train_from [gpus_0valid_best.pt]

Evaluation

Please use amr-evaluation-tool-enhanced. This is based on Marco Damonte's amr-evaluation-tool But with correction concerning unlabeled edge score.

Parsring

Parse a file where each line consists of a single sentence, output saved at [file]_parsed python src/parse.py -train_from [gpus_0valid_best.pt] -input [file] or Parse a sentence where each line consists of a single sentence, output saved at [file]_parsed python src/parse.py -train_from [gpus_0valid_best.pt] -text [type sentence here]

Pretrained models

Keeping the files under data/ folder unchanged, download model Should allow one to run parsing.

Notes

This "python src/preprocessing.py" starts with sentence original AMR files, while the paper version is trained on tokenized version provided by AMR-to-English aligner So the results could be slightly different. Also, to build a parser for out of domain data, please start preprocessing with "python src/preprocessing.py" to make everything consistent.

Contact

Contact (chunchuan.lv@gmail.com) if you have any questions!

About


Languages

Language:Python 94.0%Language:Java 6.0%