diegoceccarelli / erd-challenge

Entity Recognition and Disambiguation Challenge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Entity Recognition and Disambiguation Challenge

This package contains code for the Entity Recognition and Disambiguation Challenge. It took me some time to configure all the stuff and I think it could be useful for other teams participating in the challenge.

The package contains two utilities:

  • code to index the freebase <-> wikipedia file provided by the organizers, in order to convert the wikipedia labels to freebase id (and to filter out invalid entities);
  • code to start the REST service (for now, just the short track).

Index the entity.tsv file

Once you have downloaded the entity.tsv file provided in the Datasets page, you can index it running

./script/index.sh entity.tsv mapdb

This command will create a folder mapdb containing an index with all the mappings. At runtime you can access the index creating a WikipediaToFreebase object:

WikipediaToFreebase wikiToFreebase = new WikipediaToFreebase("mapdb"); // the folder name

And then use:

wikiToFreebase.getLabel("Diego_Maradona");

in order to retrieve the freebase-id for the entity Diego_Maradona. Please observe that wikipedia labels are case sensitive and that I split out the common prefix /wikipedia/en_title/ to make things more compact.

Rest Service

I set up a REST service for the short track as required in the challenge. I put both a POST and a GET service, the GET service is useful to test if everything works. In order to use it you only have to patch the Annotator object and make sure that you return your list of Annotations. When you are ready, you only have to run the command

mvn jetty:run 

and the rest service will answer at the address:

http://$(your-ip-address):8080/erd-challenge/rest/shortTrack

I hope this will help ;)

Diego

About

Entity Recognition and Disambiguation Challenge


Languages

Language:Java 96.9%Language:Shell 3.1%