Issue with Spacy and the en_core_web_lg.
EmanueleLM opened this issue · comments
Hi,
I'm trying to run this simple snippet of code, after having successfully (i.e., no error/warning) installed anchor, spacy and all the requirements (included the command 'python -m spacy download en_core_web_lg'):
import spacy
from anchor import anchor_text
nlp = spacy.load('en_core_web_lg')
explainer = anchor_text.AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=False, use_bert=False)
But I obtain the following error:
Exception Traceback (most recent call last)
<ipython-input-5-7f4e7f3d6066> in <module>
----> 1 explainer = anchor_text.AnchorText(nlp, ['negative', 'positive'], use_unk_distribution=False, use_bert=False)
~/.local/lib/python3.7/site-packages/anchor/anchor_text.py in __init__(self, nlp, class_names, use_unk_distribution, use_bert, mask_string)
117 self.tg = None
118 self.use_bert = use_bert
--> 119 self.neighbors = utils.Neighbors(self.nlp)
120 self.mask_string = mask_string
121 if not self.use_unk_distribution and self.use_bert:
~/.local/lib/python3.7/site-packages/anchor/utils.py in __init__(self, nlp_obj)
319 self.to_check = [w for w in self.nlp.vocab if w.prob >= -15 and w.has_vector]
320 if not self.to_check:
--> 321 raise Exception('No vectors. Are you using en_core_web_sm? It should be en_core_web_lg')
322 self.n = {}
323
Exception: No vectors. Are you using en_core_web_sm? It should be en_core_web_lg
I'm using this setting:
Fedora 30 (but I can replicate it on Ubuntu 18.04)
python 3.7.4
spacy 2.3.2 (but I've also tried with 2.2.3)
Thank you,
Emanuele
I can add that if I modify line 319 of ~/.local/lib/python3.7/site-packages/anchor/anchor_text.py,
i.e.
319 self.to_check = [w for w in self.nlp.vocab if w.prob >= -15 and w.has_vector]
with
319 self.to_check = [w for w in self.nlp.vocab if w.prob >= -20 and w.has_vector]
it starts working, this because the values in w.prob are all equal to 20.
It seems that spacy phased out w.prob
and also en_core_web_lg
in the newer versions. I think I'll just remove spacy support, BERT is just better.
Did so in 5f38997