SapienzaNLP / ewiser

A Word Sense Disambiguation system integrating implicit and explicit external knowledge.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What do you mean by WordNet offset?

brucewlee opened this issue · comments

Why not simply use a sense key? Why is there a sensekeys2offsets conversion table?

Furthermore, in an application like this:

`import spacy
from ewiser.spacy.disambiguate import Disambiguator
from spacy.language import Language
import utils

nlp = spacy.load("en_core_web_sm", disable=['parser', 'ner'])

@Language.factory('wsd')
def wsd_engine(nlp, name):
    return Disambiguator('ewiser/ewiser.semcor+wngt.pt', lang="en")

nlp.add_pipe('wsd', last=True)

# example
doc = nlp("Have you ever wondered how you are able to remember things for a long time?")

for w in doc:
    print(w.text)
    if w._.offset:
        sensekey = utils.offsets2sensekeys(w._.offset, w.lemma_)
        print(sensekey)`

Why isn't the model predicting WN offset for all words? From "Have you ever wondered how you are able to remember things for a long time?", the model only gives output for

ever%4:02:04::
wonder%2:32:01::
able%3:00:00::
remember%2:31:00::
thing%1:10:00::
long%3:00:02::
time%1:28:05::

Hi,

  1. The offsets are the unique IDs that are assigned to synsets (groups of senses). EWISER performs classification at the synset level, and that's why we output WN offsets. Offsets, as you seem to have managed to do judging from your code snippet, can be mapped to sensekeys quite easily given the lemma. I am not planning to add sensekeys to the Spacy plugin for the foreseeable future, but we welcome PRs :)

  2. WN only covers nouns, verbs, adjectives, and adverbs--with some exceptions, e.g. interrogative adverbs. Can't predict senses for anything else. In your example sentence, the only things that belong to one of the former part of speech but are (correctly) not disambiguated are "have" and "are", as they function as, respectively, auxilliary verb and copula.

Thanks for the comment. I'll make the pull request with appropriate modifications.