klintan / swedish-ner-corpus

Small semi-manual annotated web news corpus in Swedish for CoreNLP NER. 4 categories, PER, ORG, LOC and MISC.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Swedish manually annotated NER

Webbnyheter 2012 from Spraakbanken, semi-manually annotated and adapted for CoreNLP Swedish NER. Semi-manually defined in this case as: Bootstrapped from Swedish Gazetters then manually correcte/reviewed by two independent native speaking swedish annotators. No annotator agreement calculated.

There might still be quality issues in the data and imbalanced classes. If you find errors pleas create a pull request.

4 categories PER, ORG, LOC and MISC.

http://spraakbanken.gu.se/eng/resource/webbnyheter2012

About

Small semi-manual annotated web news corpus in Swedish for CoreNLP NER. 4 categories, PER, ORG, LOC and MISC.

License:Other