Error when using qualifiers with spacy-transformers model
louni-g opened this issue · comments
When loading a pipeline from disk, if the pipeline contains a spacy-transformers model and any edsnlp qualifiers this error is encountered:
KeyError: "Parameter 'W' for model 'softmax' has not been allocated yet."
Description
Full Traceback
File "/Users/Louise/Library/Application Support/JetBrains/PyCharm2023.2/scratches/scratch.py", line 8, in <module>
nlp = spacy.load("nlp")
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/__init__.py", line 51, in load
return util.load_model(
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 467, in load_model
return load_model_from_path(Path(name), **kwargs) # type: ignore[arg-type]
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 539, in load_model_from_path
nlp = load_model_from_config(
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 587, in load_model_from_config
nlp = lang_cls.from_config(
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/language.py", line 1864, in from_config
nlp.add_pipe(
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/language.py", line 821, in add_pipe
pipe_component = self.create_pipe(
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/language.py", line 709, in create_pipe
resolved = registry.resolve(cfg, validate=validate)
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/confection/__init__.py", line 756, in resolve
resolved, _ = cls._make(
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/confection/__init__.py", line 805, in _make
filled, _, resolved = cls._fill(
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/confection/__init__.py", line 877, in _fill
getter_result = getter(*args, **kwargs)
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/edsnlp/pipelines/qualifiers/negation/negation.py", line 174, in __init__
super().__init__(
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/edsnlp/pipelines/qualifiers/base.py", line 84, in __init__
self.phrase_matcher.build_patterns(nlp=nlp, terms=terms)
File "edsnlp/matchers/phrase.pyx", line 99, in edsnlp.matchers.phrase.EDSPhraseMatcher.build_patterns
File "edsnlp/matchers/phrase.pyx", line 111, in edsnlp.matchers.phrase.EDSPhraseMatcher.build_patterns
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/language.py", line 1618, in pipe
for doc in docs:
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 1685, in _pipe
yield from proc.pipe(docs, **kwargs)
File "spacy/pipeline/pipe.pyx", line 55, in pipe
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 1685, in _pipe
yield from proc.pipe(docs, **kwargs)
File "spacy/pipeline/transition_parser.pyx", line 245, in pipe
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 1632, in minibatch
batch = list(itertools.islice(items, int(batch_size)))
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 1685, in _pipe
yield from proc.pipe(docs, **kwargs)
File "spacy/pipeline/trainable_pipe.pyx", line 79, in pipe
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 1704, in raise_error
raise e
File "spacy/pipeline/trainable_pipe.pyx", line 75, in spacy.pipeline.trainable_pipe.TrainablePipe.pipe
File "spacy/pipeline/tagger.pyx", line 138, in spacy.pipeline.tagger.Tagger.predict
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/model.py", line 334, in predict
return self._func(self, X, is_train=False)[0]
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 54, in forward
Y, inc_layer_grad = layer(X, is_train=is_train)
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/model.py", line 310, in __call__
return self._func(self, X, is_train=is_train)
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/layers/with_array.py", line 42, in forward
return cast(Tuple[SeqT, Callable], _list_forward(model, Xseq, is_train))
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/layers/with_array.py", line 77, in _list_forward
Yf, get_dXf = layer(Xf, is_train)
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/model.py", line 310, in __call__
return self._func(self, X, is_train=is_train)
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/layers/softmax.py", line 69, in forward
W = cast(Floats2d, model.get_param("W"))
File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/model.py", line 235, in get_param
raise KeyError(
KeyError: "Parameter 'W' for model 'softmax' has not been allocated yet."
The error occurs during the initialization of the qualifiers, where the token_pipelines
are ran in EDSPhraseMatcher
's build_patterns
. I did a bit of digging and it seems like the error comes from the fact that the spacy-transformers pipelines are not fully initialized at this point so running them raises an error. Possible fixes could be to skip the problematic pipes if they are not necessary to run, or do this step once the whole pipeline has been completely initialized (not in the __init__
)
How to reproduce the bug
import spacy
nlp = spacy.load("fr_dep_news_trf")
nlp.add_pipe("sentencizer")
nlp.add_pipe("eds.negation", name="eds_negation")
nlp("Test") # no problem here
nlp.to_disk("nlp")
nlp = spacy.load("nlp") # here is the bug
Your Environment
- Operating System: macOS
- Python Version Used: 3.9
- spaCy Version Used: 3.7.2
- EDS-NLP Version Used: 0.9.1
- Environment Information:
- spacy-tranformers version: 1.3.2
Hi, thank you for this detailed feedback !
Indeed, the eds.negation
(and any other pipe relying on the EDSPhraseMatcher
pipe) applies the same processing to the entries of its term lists as it does to documents. For that, it filters the pipes to keep those that affect the token extensions, and the lemmatizer
and morphologizer
components declare such changes to tokens:
nlp.get_pipe_meta('morphologizer').assigns
# ['token.morph', 'token.pos']
nlp.get_pipe_meta('morphologizer').assigns
# ['token.lemma']
Ideally,
- spaCy should initialize the transformer / decoder pipe before subsquent pipes are added to the pipeline
- edsnlp should serialize its pipes to avoid having to rerun the
__init__()
method (e.g. instead of storing terms, storing the.norm_
,.text
extensions, ...)
In the meantime,
- I will update
EDSPhraseMatcher
(and its variants) to skip pipes that are clearly not required (as shown by thenlp.get_pipe_meta('morphologizer').assigns
attribute) or pipes that are disabled - You can add edsnlp pipes before the transformer `nlp.add_pipe(..., before="transformer"), but this might defeat their purpose
@louni-g may I ask for what task you need a transformer in your pipeline? is it to use the pre-trained lemmatizer / morphologizer / ... pipes of spacy, or to train a new model, or something else ?
@louni-g may I ask for what task you need a transformer in your pipeline? is it to use the pre-trained lemmatizer / morphologizer / ... pipes of spacy, or to train a new model, or something else ?
I trained a spacy-transformers NER model and in my case I only have the following pipes: ["transformer", "ner"] and it's the "ner" one that ends up in the token_pipelines:
nlp.get_pipe_meta('ner').assigns
# ['doc.ents', 'token.ent_iob', 'token.ent_type']
so I think it would be a totally ok to skip non necessary pipes 👍