RuntimeError: dictionary changed size during iteration with spacy.load()
akapeletzis opened this issue · comments
I am loading a spaCy model as part of a step in my Dataflow streaming pipeline. To load the pre-downloaded spaCy model for a specific language I am using nlp_model = spacy.load(SPACY_KEYS[lang]) where SPACY_KEYS is a dictionary containing the names of the models for each language (e.g. 'en': 'en_core_web_sm').
This works without any issues for the majority of the jobs run by the pipeline, but for a few iterations I am getting the following error, which seems to be coming from catalogue:
Error message from worker: generic::unknown: Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 752, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py", line 870, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File "apache_beam/runners/common.py", line 1368, in apache_beam.runners.common._OutputProcessor.process_outputs
File "/usr/local/lib/python3.7/site-packages/submodules/entities_and_pii_removal.py", line 259, in entities_and_PII
nlp_model = spacy.load(SPACY_KEYS[lang]) # load spacy model
File "/usr/local/lib/python3.7/site-packages/spacy/__init__.py", line 52, in load
name, vocab=vocab, disable=disable, exclude=exclude, config=config
File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 420, in load_model
return load_model_from_package(name, **kwargs) # type: ignore[arg-type]
File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 453, in load_model_from_package
return cls.load(vocab=vocab, disable=disable, exclude=exclude, config=config) # type: ignore[attr-defined]
File "/usr/local/lib/python3.7/site-packages/de_core_news_sm/__init__.py", line 10, in load
return load_model_from_init_py(__file__, **overrides)
File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 621, in load_model_from_init_py
config=config,
File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 489, in load_model_from_path
return nlp.from_disk(model_path, exclude=exclude, overrides=overrides)
File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 2042, in from_disk
util.from_disk(path, deserializers, exclude) # type: ignore[arg-type]
File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 1299, in from_disk
reader(path / key)
File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 2037, in <lambda>
p, exclude=["vocab"]
File "spacy/pipeline/trainable_pipe.pyx", line 343, in spacy.pipeline.trainable_pipe.TrainablePipe.from_disk
File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 1299, in from_disk
reader(path / key)
File "spacy/pipeline/trainable_pipe.pyx", line 333, in spacy.pipeline.trainable_pipe.TrainablePipe.from_disk.load_model
File "spacy/pipeline/trainable_pipe.pyx", line 334, in spacy.pipeline.trainable_pipe.TrainablePipe.from_disk.load_model
File "/usr/local/lib/python3.7/site-packages/thinc/model.py", line 593, in from_bytes
return self.from_dict(msg)
File "/usr/local/lib/python3.7/site-packages/thinc/model.py", line 624, in from_dict
loaded_value = deserialize_attr(default_value, value, attr, node)
File "/usr/local/lib/python3.7/functools.py", line 840, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
File "/usr/local/lib/python3.7/site-packages/thinc/model.py", line 804, in deserialize_attr
return srsly.msgpack_loads(value)
File "/usr/local/lib/python3.7/site-packages/srsly/_msgpack_api.py", line 27, in msgpack_loads
msg = msgpack.loads(data, raw=False, use_list=use_list)
File "/usr/local/lib/python3.7/site-packages/srsly/msgpack/__init__.py", line 76, in unpackb
for decoder in msgpack_decoders.get_all().values():
File "/usr/local/lib/python3.7/site-packages/catalogue/__init__.py", line 110, in get_all
for keys, value in REGISTRY.items():
RuntimeError: dictionary changed size during iteration
Hi, I think that we need to replace REGISTRY.items()
with REGISTRY.copy().items()
for this to work without these kinds of runtime errors.
It's kind of hard to reproduce this kind of error on our end, so it would be helpful for us if you could try installing the modified version of catalogue
from #29 to see if it resolves this problem for you:
pip install https://github.com/adrianeboyd/catalogue/archive/refs/heads/bugfix/copy-items-iteration.zip