jenojp / negspacy

spaCy pipeline object for negating concepts in text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How can I get this to work?

SallyBean opened this issue · comments

I think I'm missing something here and can't seem to resolve it.

The code works with the example texts provided in much of the documentation (e.g. "She does not like Steve Jobs but likes Apple products."), and the term 'cannot' appears in the termset - how can I identify these simple negations? Please note the print is indented in the original code.

Here's my code:

pip install negspacy

import spacy
from negspacy.negation import Negex

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

ts = termset("en")

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

doc = nlp("Men cannot play football.")
for e in doc.ents:
print(e.text,` e._.negex)

Hi - there could be two issues here:

  1. you're limiting the entity types to PERSON or ORG that you're doing negation on.
  2. Even if you weren't limiting types, there are no entities in the example that show up using "en_core_web_sm".

You can create and entity ruler . See the example below where football is negated by cannot which is a preceding term in the "en" termset. If you want to change termsets, see this example.

import spacy
from negspacy.negation import Negex
from negspacy.termsets import termset

ts = termset("en")

nlp = spacy.load("en_core_web_sm")

ruler = nlp.add_pipe("entity_ruler")
patterns = [{"label": "SPORT", "pattern": "football"},
            {"label": "SPORT", "pattern": [{"LOWER": "ice"}, {"LOWER": "hockey"}]}]
ruler.add_patterns(patterns)

nlp.add_pipe(
    "negex",
    config={
        "neg_termset":ts.get_patterns()
    }
)

doc = nlp("Men cannot play football.")

for e in doc.ents:
    print(e.text, e._.negex)

Thanks so much for your help! This makes sense. Apologies for the delay in getting back to you.

Although, if I replace 'football' with 'hockey' in the doc - nothing is returned - am I missing something else?

Huge apologies, I'm very new to this and learning.

So for the example code I pasted above, it's looking specifically for 'ice hockey' not just 'hockey'. If you changed the patterns to remove 'ice' as shown below then it would work.

            {"label": "SPORT", "pattern": [{"LOWER": "hockey"}]}]

Thanks @jenojp, this is really helpful!