marcotcr / anchor

Code for "High-Precision Model-Agnostic Explanations" paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue with Spacy and the en_core_web_lg.

EmanueleLM opened this issue · comments

Hi,

I'm trying to run this simple snippet of code, after having successfully (i.e., no error/warning) installed anchor, spacy and all the requirements (included the command 'python -m spacy download en_core_web_lg'):

import spacy
from anchor import anchor_text

nlp = spacy.load('en_core_web_lg')
explainer = anchor_text.AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=False, use_bert=False)

But I obtain the following error:

Exception                                 Traceback (most recent call last)
<ipython-input-5-7f4e7f3d6066> in <module>
----> 1 explainer = anchor_text.AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=False, use_bert=False)

~/.local/lib/python3.7/site-packages/anchor/anchor_text.py in __init__(self, nlp, class_names, use_unk_distribution, use_bert, mask_string)
    117         self.tg = None
    118         self.use_bert = use_bert
--> 119         self.neighbors = utils.Neighbors(self.nlp)
    120         self.mask_string = mask_string
    121         if not self.use_unk_distribution and self.use_bert:

~/.local/lib/python3.7/site-packages/anchor/utils.py in __init__(self, nlp_obj)
    319         self.to_check = [w for w in self.nlp.vocab if w.prob >= -15 and w.has_vector]
    320         if not self.to_check:
--> 321             raise Exception('No vectors. Are you using en_core_web_sm? It should be en_core_web_lg')
    322         self.n = {}
    323 

Exception: No vectors. Are you using en_core_web_sm? It should be en_core_web_lg

I'm using this setting:

Fedora 30 (but I can replicate it on Ubuntu 18.04)
python 3.7.4
spacy 2.3.2 (but I've also tried with 2.2.3)

Thank you,
Emanuele

I can add that if I modify line 319 of ~/.local/lib/python3.7/site-packages/anchor/anchor_text.py, i.e.

    319         self.to_check = [w for w in self.nlp.vocab if w.prob >= -15 and w.has_vector]

with

    319         self.to_check = [w for w in self.nlp.vocab if w.prob >= -20 and w.has_vector]

it starts working, this because the values in w.prob are all equal to 20.

It seems that spacy phased out w.prob and also en_core_web_lg in the newer versions. I think I'll just remove spacy support, BERT is just better.