bplank / danish_ner_transfer

Danish Named Enity Recognition (NER)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Danish Named Enity Recognition (NER) - Cross-lingual Transfer

Repository accompanying the NER (Named Entity Recognition) transfer for Danish paper by Plank (2019) at NoDaLiDa: Neural Cross-Lingual Transfer and Limited Annotated Data for Named Entity Recognition in Danish [paper] [data]

Abstract Named Entity Recognition (NER) has greatly advanced by the introduction of deep neural architectures. However, the success of these methods depends on large amounts of training data. The scarcity of publicly-available human-labeled datasets has resulted in limited evaluation of existing NER systems, as is the case for Danish. This paper studies the effectiveness of cross-lingual transfer for Danish, evaluates its complementarity to limited gold data, and sheds light on performance of Danish NER.

Data overview

Dev Test Train Tiny (5k) Train Small (10k)
Sentences 564 565 272 604
Tokens 10,332 10,023 4,669 10,069
Types 3,640 3,424 1,918 3,525
TTR 0.35 0.34 0.41 0.35
Sent.w/ NE 220 226 96 206
Sent.w/ NE% 39% 34% 35% 34%
Entities 347 393 153 341

Overview of the annotated Danish NER data. Around 35%-39% of the sentences contain NEs. TTR: type-token ratio. Table reported from Plank (2019).

References

If you use this resource, please cite the paper and the references to the Danish UD data:

  • Plank, Barbara. Neural Cross-Lingual Transfer and Limited Annotated Data for Named Entity Recognition in Danish. NoDaLiDa, 2019.

  • Johannsen, Anders, Martínez Alonso, Héctor and Plank, Barbara. Universal Dependencies for Danish. TLT14, 2015.

  • Buch-Kromann, Matthias T., Line Mikkelsen, and Stine Kern Lynge. Danish dependency treebank. TLT, 2003.

  • Keson, Britt. Documentation of The Danish Morpho-syntactically Tagged PAROLE Corpus. Technical report, DSL, 1998. Or:

  • Bilgram, Thomas and Keson, Britt. The Construction of a Tagged Danish Corpus. In NoDaLiDa, 1998.

Main reference for the NER annotation on top of Danish UD (DDT)

@inproceedings{plank-2019-neural,
    title = "Neural Cross-Lingual Transfer and Limited Annotated Data for Named Entity Recognition in {D}anish",
    author = "Plank, Barbara",
    booktitle = "Proceedings of the 22nd Nordic Conference on Computational Linguistics",
    month = sep # "{--}" # oct,
    year = "2019",
    address = "Turku, Finland",
    publisher = {Link{\"o}ping University Electronic Press},
    url = "https://www.aclweb.org/anthology/W19-6143",
    pages = "370--375",
    abstract = "Named Entity Recognition (NER) has greatly advanced by the introduction of deep neural architectures. However, the success of these methods depends on large amounts of training data. The scarcity of publicly-available human-labeled datasets has resulted in limited evaluation of existing NER systems, as is the case for Danish. This paper studies the effectiveness of cross-lingual transfer for Danish, evaluates its complementarity to limited gold data, and sheds light on performance of Danish NER.",
}

About

Danish Named Enity Recognition (NER)