Attach Supersenses to Synsets
simongray opened this issue · comments
Supersenses, as seen in the English WordNet, have already been mapped 1:1 to DanNet's ontological types derived from the EuroWordNet ontology.
I have an excel file supplied by Bolette to use for populating DanNet with Supsersenses based on this mapping.
Supersenses
Princeton documentation: https://wordnet.princeton.edu/documentation/lexnames5wn
From email correspondence:
Bolette: Supersenses were popular in a certain period of wsd investigations because they made disambiguation more manageable in NLP. They are sometimes seen as an extension of NER. One could also use an ontology like the EuroWordNet Ontology, but for some reason supersenses became more used for the wsd purposes in a series of papers. I have not seen a lot of work supersenses in later years, though.
(...)
We refer among others to these two papers:
Massimiliano Ciaramita and Yasemin Altun. 2006. Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In Proc. of Proceedings of EMNLP, pages 594–602, Sydney, Australia, July.
Massimiliano Ciaramita and Mark Johnson. 2003. Supersense tagging of unknown nouns in WordNet. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 168– 175. Association for Computational Linguistics.
We worked with them in this paper:
https://aclanthology.org/2016.gwc-1.30.pdf
Another email (usage of Supersenses):
Og link til korpusset, herunder den danske del: https://www.clarin.si/repository/xmlui/handle/11356/1842
Som er den del vi i første omgang gerne vil linke til supersenses
The Supersenses mapping is a 1-to-many, but the many all seem to be separated by part-of-speech, fortunately.
The query will have to take this into account.
Apparently, the only problematic rows are these
Plant+Object+Comestible 136 noun.food; noun.plant
Plant+Object+Part+Comestible 324 noun.food; noun.plant
so it may just be down to selecting if edible plants are food or plants.
Currently blocked by row 137:
noun.food 804 noun.substance
The first column should be an ontotype, but it has been replaced with a Supersense, making the ~800 synsets impossible to classify until the original authors of this mapping (e.g. Bolette) chime in.
I went with Natural+Substance
after conferring with Sussi.