Error when using qualifiers with spacy-transformers model

Question

Error when using qualifiers with spacy-transformers model

louni-g opened this issue 9 months ago · comments

When loading a pipeline from disk, if the pipeline contains a spacy-transformers model and any edsnlp qualifiers this error is encountered:

KeyError: "Parameter 'W' for model 'softmax' has not been allocated yet."

Description

Full Traceback

File "/Users/Louise/Library/Application Support/JetBrains/PyCharm2023.2/scratches/scratch.py", line 8, in <module>
    nlp = spacy.load("nlp")
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/__init__.py", line 51, in load
    return util.load_model(
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 467, in load_model
    return load_model_from_path(Path(name), **kwargs)  # type: ignore[arg-type]
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 539, in load_model_from_path
    nlp = load_model_from_config(
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 587, in load_model_from_config
    nlp = lang_cls.from_config(
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/language.py", line 1864, in from_config
    nlp.add_pipe(
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/language.py", line 821, in add_pipe
    pipe_component = self.create_pipe(
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/language.py", line 709, in create_pipe
    resolved = registry.resolve(cfg, validate=validate)
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/confection/__init__.py", line 756, in resolve
    resolved, _ = cls._make(
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/confection/__init__.py", line 805, in _make
    filled, _, resolved = cls._fill(
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/confection/__init__.py", line 877, in _fill
    getter_result = getter(*args, **kwargs)
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/edsnlp/pipelines/qualifiers/negation/negation.py", line 174, in __init__
    super().__init__(
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/edsnlp/pipelines/qualifiers/base.py", line 84, in __init__
    self.phrase_matcher.build_patterns(nlp=nlp, terms=terms)
  File "edsnlp/matchers/phrase.pyx", line 99, in edsnlp.matchers.phrase.EDSPhraseMatcher.build_patterns
  File "edsnlp/matchers/phrase.pyx", line 111, in edsnlp.matchers.phrase.EDSPhraseMatcher.build_patterns
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/language.py", line 1618, in pipe
    for doc in docs:
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 1685, in _pipe
    yield from proc.pipe(docs, **kwargs)
  File "spacy/pipeline/pipe.pyx", line 55, in pipe
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 1685, in _pipe
    yield from proc.pipe(docs, **kwargs)
  File "spacy/pipeline/transition_parser.pyx", line 245, in pipe
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 1632, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 1685, in _pipe
    yield from proc.pipe(docs, **kwargs)
  File "spacy/pipeline/trainable_pipe.pyx", line 79, in pipe
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/spacy/util.py", line 1704, in raise_error
    raise e
  File "spacy/pipeline/trainable_pipe.pyx", line 75, in spacy.pipeline.trainable_pipe.TrainablePipe.pipe
  File "spacy/pipeline/tagger.pyx", line 138, in spacy.pipeline.tagger.Tagger.predict
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/model.py", line 334, in predict
    return self._func(self, X, is_train=False)[0]
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 54, in forward
    Y, inc_layer_grad = layer(X, is_train=is_train)
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/model.py", line 310, in __call__
    return self._func(self, X, is_train=is_train)
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/layers/with_array.py", line 42, in forward
    return cast(Tuple[SeqT, Callable], _list_forward(model, Xseq, is_train))
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/layers/with_array.py", line 77, in _list_forward
    Yf, get_dXf = layer(Xf, is_train)
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/model.py", line 310, in __call__
    return self._func(self, X, is_train=is_train)
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/layers/softmax.py", line 69, in forward
    W = cast(Floats2d, model.get_param("W"))
  File "/Users/Louise/Documents/Projects/ai/scracth_venv/lib/python3.9/site-packages/thinc/model.py", line 235, in get_param
    raise KeyError(
KeyError: "Parameter 'W' for model 'softmax' has not been allocated yet."

The error occurs during the initialization of the qualifiers, where the token_pipelines are ran in EDSPhraseMatcher's build_patterns. I did a bit of digging and it seems like the error comes from the fact that the spacy-transformers pipelines are not fully initialized at this point so running them raises an error. Possible fixes could be to skip the problematic pipes if they are not necessary to run, or do this step once the whole pipeline has been completely initialized (not in the __init__)

How to reproduce the bug

import spacy

nlp = spacy.load("fr_dep_news_trf")
nlp.add_pipe("sentencizer")
nlp.add_pipe("eds.negation", name="eds_negation")
nlp("Test")   # no problem here 
nlp.to_disk("nlp")
nlp = spacy.load("nlp")  # here is the bug

Your Environment

Operating System: macOS
Python Version Used: 3.9
spaCy Version Used: 3.7.2
EDS-NLP Version Used: 0.9.1
Environment Information:
- spacy-tranformers version: 1.3.2

Perceval Wajsburt · Answer 1 · Tue Oct 24 2023 23:40:00 GMT+0800 (China Standard Time)

Hi, thank you for this detailed feedback !
Indeed, the eds.negation (and any other pipe relying on the EDSPhraseMatcher pipe) applies the same processing to the entries of its term lists as it does to documents. For that, it filters the pipes to keep those that affect the token extensions, and the lemmatizer and morphologizer components declare such changes to tokens:

nlp.get_pipe_meta('morphologizer').assigns
# ['token.morph', 'token.pos']

nlp.get_pipe_meta('morphologizer').assigns
# ['token.lemma']

Ideally,

spaCy should initialize the transformer / decoder pipe before subsquent pipes are added to the pipeline
edsnlp should serialize its pipes to avoid having to rerun the __init__() method (e.g. instead of storing terms, storing the .norm_, .text extensions, ...)

In the meantime,

I will update EDSPhraseMatcher (and its variants) to skip pipes that are clearly not required (as shown by the nlp.get_pipe_meta('morphologizer').assigns attribute) or pipes that are disabled
You can add edsnlp pipes before the transformer `nlp.add_pipe(..., before="transformer"), but this might defeat their purpose

Perceval Wajsburt · Answer 2 · Tue Oct 24 2023 23:44:31 GMT+0800 (China Standard Time)

@louni-g may I ask for what task you need a transformer in your pipeline? is it to use the pre-trained lemmatizer / morphologizer / ... pipes of spacy, or to train a new model, or something else ?

Louni · Answer 3 · Wed Oct 25 2023 00:04:31 GMT+0800 (China Standard Time)

@louni-g may I ask for what task you need a transformer in your pipeline? is it to use the pre-trained lemmatizer / morphologizer / ... pipes of spacy, or to train a new model, or something else ?

I trained a spacy-transformers NER model and in my case I only have the following pipes: ["transformer", "ner"] and it's the "ner" one that ends up in the token_pipelines:

nlp.get_pipe_meta('ner').assigns
# ['doc.ents', 'token.ent_iob', 'token.ent_type']

so I think it would be a totally ok to skip non necessary pipes 👍