ims-tcl / DeRE

A Task and Domain-Independent Slot Filling Framework for Declarative Relation Extraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DeRE build status

Setup

Requirements

  • Python 3.7+
  • git

Installing DeRE

To install (as user):

$ pip install .

To install (as developer):

$ pip install -e .  # editable
$ pip install -r dev_requirements.txt

To use DeRE, refer to the help that can be shown by specifying a --help flag either after the main command, or a subcommand (e.g. dere build --help):

$ dere --help
Usage: dere [OPTIONS] COMMAND [ARGS]...

Options:
  -v, --verbose  Show debug info
  -q, --quiet    Do less logging. Can be provided multiple times.
  --help         Show this message and exit.

Commands:
  build
  evaluate
  predict
  train

See also the tutorials.

Paper

DeRE: A Task and Domain-Independent Slot Filling Framework for Declarative Relation Extraction

Reference

If you plan to use DeRE please cite:

@inproceedings{Adel2018,
  author = {Heike Adel and Laura Ana Maria Bostan and Sean Papay and Sebastian Pad\'{o} and Roman Klinger},
  title = {{DeRE}: A Task and Domain-Independent Slot Filling Framework for Declarative Relation Extraction},
  booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  pages = {42--47},
  year = {2018},
  address = {Brussels, Belgium},
  month = {November},
  publisher = {Association for Computational Linguistics},
  url = {http://aclweb.org/anthology/D18-2008}
}

Tutorials:

User

In this tutorial we show how you can use a pretrained model for an existing task (i.e. BioNLP'09 Shared Task on Event Extraction) to obtain predictions on an unlabeled dataset.

You have:

  • the BioNLP task already modeled at task-specs/bionlpst.xml
  • a pretrained model called baseline_trained.pkl located at tutorial/model/baseline_trained.pkl
  • an unlabeled corpus (in the BRAT format) located at tutorial/data/test

To use the pretrained model to generate predictions on the unlabeled corpus, and output them in the BRAT format at tutorial/data/predict, type the following command in your terminal:

$ python3 dere predict --model-path tutorial/model/baseline_trained.pkl --corpus-format BRAT --corpus-path tutorial/data/test --output tutorial/data/predict/

You can check the general usage for predict by running:

$ python3 dere predict --help

Application Developer

In this tutorial we show you how to formalize an abstract conceptualization of an Information Extraction task (i.e. BioNLP'09 Shared Task on Event Extraction), construct a model to model this task, train said model on a training set, and evaluate it on a test set of the corpus.

You have:

  • a labeled corpus split in train/test sets, located at tutorial/data/(train|test)
  • an XML task sepcification located at task-specs/bionlpst.xml

Then you use

$ dere build
$ mkdir tutorial/model
$ python3 dere build --task-spec task-specs/bionlpst.xml --model-spec model-specs/bionlpst-baseline.json --outfile tutorial/model/baseline.pkl

This will create a new, untrained model, which will be stored in the file tutorial/model/baseline.pkl.

To train the model on the training corpus you run:

$ python3 dere train --model-path tutorial/model/baseline.pkl --corpus-format BRAT --outfile tutorial/model/baseline_trained.pkl --corpus-path tutorial/data/train

The trained model baseline_trained.pkl can be now evaluated on the test corpus by first predicting the frames using the predict command as in:

$ python3 dere predict --model-path tutorial/model/baseline_trained.pkl --corpus-format BRAT --corpus-path tutorial/data/test --output tutorial/data/predict/

The predicted annotations for the unlabeled set you find in the text files that end with .ann located at tutorial/data/predict/.

In order to evaluate the predictions you could use the evaluate command by running:

$ python3 dere evaluate --predicted tutorial/data/predict --gold tutorial/data/test --task-spec task-specs/bionlpst.xml --corpus-format BRAT

You can check the general usage for evaluate by running:

$ python3 dere evaluate --help

If you want to model your own task, you first need to specify your new task by writing it as an XML task sepcification. You can do that by following some examples of existing task specification files. These can be found in task-specs/ in the DeRe repository. Then you will have to save this file as task-specs/your_awesome_spec.xml.

The other dere commands for work as exemplified already above on the BioNLP task!

Model Developer

In order to implement a novel model and use it with-in dere do the following:

  • write a class that subclasses dere.models.Model, e.g.:
#!/usr/bin/env python

from dere.models import Model

class TutorialModel(Model):

    def train(self, corpus, dev_corpus=None):
        pass

    def predict(self, corpus):
        pass

Save this file as a python script, for example as tutorial_model.py and let it be located at dere/models.

  • the new Model has to have implemented at least two methods: train, predict, so implement them
  • train gets a Corpus as the first argument and optionally another Corpus as second argument (a development corpus)
  • predict gets a single Corpus
  • both train and predict do not return anything: predict modifies the given corpus to add annotations, while train trains the model's classifier.

To work with your new model within dere you can use the already-introduced interface and specify your model class during the build step as a "dotted name" e.g. tutorial_model.TutorialModel (so filename of the module, without the .py extension + "." + name of the implemented class).

Again, to build the new model use:

$ python3 dere build tutorial_model.TutorialModel --task-spec task-specs/bionlpst.xml --outfile tutorial/model/tutorial.pkl

The rest of the commands work as introduced for User and Application Developer.

About

A Task and Domain-Independent Slot Filling Framework for Declarative Relation Extraction


Languages

Language:Python 100.0%