abhishekkrthakur / GermaNER

GermaNER: Free Open German Named Entity Recognition Tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#GermaNER - Free Open German Named Entity Recognition Tool

GermaNER is licensed under ASL 2.0 and other lenient licenses, allowing its use for academic and commercial purposes without restrictions.

##GermaNER in three lines

To tag German texts:

  1. Download the binary from here or if you don't have enough memory, use GermaNER without freebase features from here.
  2. Tokenize your text so that it is one word per line. Sentences should be marked with a blank new line. Read details [here] (https://github.com/tudarmstadt-lt/GermaNER/blob/master/germaner/src/main/java/de/tu/darmstadt/lt/ner/doc/File-Format.md).
  3. Run the jar file as follows (see details here)

java -Xmx4g -jar GermaNER-09-09-2015.jar -t YourTokenizedTestFile -o OutputFileName

                          OR (if you have less memmory)

java -Xmx1300m -jar GermaNER-nofb-09-09-2015.jar -t YourTokenizedTestFile -o OutputFileName

The tagged document will be under output/result.tsv

- NEW

##Train GermaNER with your own training file and feature files

If you like to train GermaNER with your own training file or our training file from here but with different feature files, do as follows

  • Get the data.zip file from here and change the contents of any files as needed. Once done, zip back as data.zip
  • Get the config file, config.properties, here. set useFreeBase=0 if you do not have enough memory. If you have lookup feature files like this, set lookUpFeature=1. If you have list feature files like this, set listFeature=1.
  • Get the GermaNER jar file from here. This jar file is only meant to train an NER model on new dataset or modified features. It does not contain usable NER model.

For training and testing at the same time, run it as follows:

java -jar GermaNER-train-04-12-2016.jar -f YOURTRAINFILE -t YOURTESTFILE -r data.zip -d MODELDIR -o OUTPUTFILENAME -c config.properties

For testing, once your run the above command and you have the NER model under MODELDIR, run it without the -f switch as follows

java -jar GermaNER-train-04-12-2016.jar -t YOURTESTFILE -r data.zip -d MODELDIR -o OUTPUTFILENAME -c config.properties

- NEW END

Contents

About

GermaNER: Free Open German Named Entity Recognition Tool

License:Other


Languages

Language:Java 95.0%Language:Perl 5.0%