CogStack / MedCATtrainer

A simple interface to inspect, improve and add concepts to biomedical NER+L -> MedCAT.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Demo - Config validation error

damiansuski opened this issue · comments

I created a new annotation project with cdb an vocabulary based on MedMentions and a small subset of patient notes from pt_notes.csv. When I start a new annotation project a "Config validation error" occurs.

screen

It reoccurs each time I click any document on the list. The exact error messages are

Error: Config validation error ner -> incorrect_spans_key extra fields not permitted {'nlp': <spacy.lang.en.English object at 0x7f2187cd4c10>, 'name': 'ner', 'incorrect_spans_key': None, 'model': {'@architectures': 'spacy.TransitionBasedParser.v2', 'state_type': 'ner', 'extra_state_tokens': False, 'hidden_width': 64, 'maxout_pieces': 2, 'use_upper': True, 'nO': None, 'tok2vec': {'@architectures': 'spacy.Tok2Vec.v2', 'embed': {'@architectures': 'spacy.MultiHashEmbed.v2', 'width': 96, 'attrs': ['NORM', 'PREFIX', 'SUFFIX', 'SHAPE'], 'rows': [5000, 2500, 2500, 2500], 'include_static_vectors': True}, 'encode': {'@architectures': 'spacy.MaxoutWindowEncoder.v2', 'width': 96, 'depth': 4, 'window_size': 1, 'maxout_pieces': 3}}}, 'moves': None, 'update_with_oracle_cut_size': 100, '@factories': 'ner'}

and

Full Error:

Traceback (most recent call last):
  File "./api/views.py", line 319, in prepare_documents
    CAT_MAP=CAT_MAP, project=project)
  File "./api/utils.py", line 301, in get_medcat
    cat = CAT(cdb=cdb, config=cdb.config, vocab=vocab)
  File "/usr/local/lib/python3.7/site-packages/medcat/cat.py", line 98, in __init__
    self._create_pipeline(self.config)
  File "/usr/local/lib/python3.7/site-packages/medcat/cat.py", line 105, in _create_pipeline
    self.pipe = Pipe(tokenizer=spacy_split_all, config=config)
  File "/usr/local/lib/python3.7/site-packages/medcat/pipe.py", line 40, in __init__
    self._nlp = spacy.load(config.general['spacy_model'], disable=config.general['spacy_disabled_components'])
  File "/usr/local/lib/python3.7/site-packages/spacy/__init__.py", line 51, in load
    name, vocab=vocab, disable=disable, exclude=exclude, config=config
  File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 324, in load_model
    return load_model_from_package(name, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 357, in load_model_from_package
    return cls.load(vocab=vocab, disable=disable, exclude=exclude, config=config)
  File "/usr/local/lib/python3.7/site-packages/en_core_web_md/__init__.py", line 10, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 523, in load_model_from_init_py
    config=config,
  File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 392, in load_model_from_path
    nlp = load_model_from_config(config, vocab=vocab, disable=disable, exclude=exclude)
  File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 435, in load_model_from_config
    validate=validate,
  File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 1677, in from_config
    raw_config=raw_config,
  File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 779, in add_pipe
    validate=validate,
  File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 660, in create_pipe
    resolved = registry.resolve(cfg, validate=validate)
  File "/usr/local/lib/python3.7/site-packages/thinc/config.py", line 730, in resolve
    config, schema=schema, overrides=overrides, validate=validate, resolve=True
  File "/usr/local/lib/python3.7/site-packages/thinc/config.py", line 779, in _make
    config, schema, validate=validate, overrides=overrides, resolve=resolve
  File "/usr/local/lib/python3.7/site-packages/thinc/config.py", line 839, in _fill
    overrides=overrides,
  File "/usr/local/lib/python3.7/site-packages/thinc/config.py", line 901, in _fill
    ) from None
thinc.config.ConfigValidationError: 

Config validation error

There was an issue with the spacy version. In the In Docker container there was a spacy 3.0.8. I forced the installation of the spacy 3.1.3 version with the command (inside container)

pip install spacy==3.1.3

just to see wheather the error disappears. And it disappeared. However, I cannot be sure if it does not mess some dependencies.

hi @damiansuski - thanks for reporting the bug. spacy is getting pulled in via medcat, so you're correct in thinking this might mess with other CDBs.

Did you download the CDB from somewhere or did you create it?

The following order of commands will downgrade spacy to 3.0.8 automatically due to en_core_sci_md not supporting spacy 3.1 but will not downgrade en_core_web_md accordingly.

RUN pip install -r /home/requirements.txt
RUN python -m spacy download en_core_web_md
RUN pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_md-0.4.0.tar.gz

That can be confirmed by the docker build logs:

ERROR: medcat 1.2.7 has requirement spacy<3.1.4,>=3.1.0, but you'll have spacy 3.0.8 which is incompatible.
ERROR: en-core-web-md 3.1.0 has requirement spacy<3.2.0,>=3.1.0, but you'll have spacy 3.0.8 which is incompatible.
Installing collected packages: click, typer, spacy, en-core-sci-md
  Found existing installation: click 8.0.4
    Uninstalling click-8.0.4:
      Successfully uninstalled click-8.0.4
  Found existing installation: typer 0.4.0
    Uninstalling typer-0.4.0:
      Successfully uninstalled typer-0.4.0
  Found existing installation: spacy 3.1.3
    Uninstalling spacy-3.1.3:
      Successfully uninstalled spacy-3.1.3
  Running setup.py install for en-core-sci-md ... done
Successfully installed click-7.1.2 en-core-sci-md-0.4.0 spacy-3.0.8 typer-0.3.2

Changing the command order and moving line 23 above line 21 should avoid the above downgrading on medcat. Nonetheless, regression tests are needed for finding out if there is any degradation.

Just found en_core_sci_md-0.5.0 requires spacy 3.2. So ultimately, medcat needs to work with spacy 3.2 and I am gonna create a new task on that.

Thanks @baixiac - en_core_sci_md was added in the latest patch build for of our Trust deployments, I wonder if en_core_sci_md-0.5 is backward compatible with 0.4 models...

I am not sure either. Maybe go and ask them to confirm that by creating an issue? https://github.com/allenai/scispacy/issues