kuhumcst / DanNet

The Danish WordNet as an RDF graph.

Home Page:https://wordnet.dk

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Attach Supersenses to Synsets

simongray opened this issue · comments

Supersenses, as seen in the English WordNet, have already been mapped 1:1 to DanNet's ontological types derived from the EuroWordNet ontology.

I have an excel file supplied by Bolette to use for populating DanNet with Supsersenses based on this mapping.

Supersenses

Princeton documentation: https://wordnet.princeton.edu/documentation/lexnames5wn

From email correspondence:

Bolette: Supersenses were popular in a certain period of wsd investigations because they made disambiguation more manageable in NLP. They are sometimes seen as an extension of NER. One could also use an ontology like the EuroWordNet Ontology, but for some reason supersenses became more used for the wsd purposes in a series of papers. I have not seen a lot of work supersenses in later years, though.

(...)

We refer among others to these two papers:

Massimiliano Ciaramita and Yasemin Altun. 2006. Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In Proc. of Proceedings of EMNLP, pages 594–602, Sydney, Australia, July.

Massimiliano Ciaramita and Mark Johnson. 2003. Supersense tagging of unknown nouns in WordNet. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 168– 175. Association for Computational Linguistics.

We worked with them in this paper:
https://aclanthology.org/2016.gwc-1.30.pdf

Another email (usage of Supersenses):

Og link til korpusset, herunder den danske del: https://www.clarin.si/repository/xmlui/handle/11356/1842

Som er den del vi i første omgang gerne vil linke til supersenses

The Supersenses mapping is a 1-to-many, but the many all seem to be separated by part-of-speech, fortunately.

The query will have to take this into account.

Apparently, the only problematic rows are these

Plant+Object+Comestible		136	noun.food; noun.plant
Plant+Object+Part+Comestible	324	noun.food; noun.plant

so it may just be down to selecting if edible plants are food or plants.

Currently blocked by row 137:

noun.food	804	noun.substance

The first column should be an ontotype, but it has been replaced with a Supersense, making the ~800 synsets impossible to classify until the original authors of this mapping (e.g. Bolette) chime in.

I went with Natural+Substance after conferring with Sussi.