titipata / scipdf_parser

Python PDF parser for scientific publications: content and figures

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Import scipdf fails when en_core_web_sm exists in another conda environment

g-simmons opened this issue · comments

Hi, I encountered this issue when importing scipdf from a conda environment. I have several environments with spacy installed, each of them most likely at a slightly different version, with perhaps a different version of en_core_web_sm. This appears to be fixed by downloading en_core_web_sm in the current environment.

import scipdf
/Users/gabe/opt/miniconda3/envs/papercast/lib/python3.7/site-packages/spacy/util.py:715: UserWarning: [W094] Model 'en_core_web_sm' (2.0.0) specifies an under-constrained spaCy version requirement: >=2.0.0a18. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.0.5,<3.1.0
  warnings.warn(warn_msg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/gabe/opt/miniconda3/envs/papercast/lib/python3.7/site-packages/scipdf/__init__.py", line 9, in <module>
    from scipdf.features.text_utils import *
  File "/Users/gabe/opt/miniconda3/envs/papercast/lib/python3.7/site-packages/scipdf/features/__init__.py", line 1, in <module>
    from .text_utils import compute_readability_stats, compute_text_stats
  File "/Users/gabe/opt/miniconda3/envs/papercast/lib/python3.7/site-packages/scipdf/features/text_utils.py", line 9, in <module>
    nlp = spacy.load('en_core_web_sm')
  File "/Users/gabe/opt/miniconda3/envs/papercast/lib/python3.7/site-packages/spacy/__init__.py", line 47, in load
    return util.load_model(name, disable=disable, exclude=exclude, config=config)
  File "/Users/gabe/opt/miniconda3/envs/papercast/lib/python3.7/site-packages/spacy/util.py", line 322, in load_model
    return load_model_from_package(name, **kwargs)
  File "/Users/gabe/opt/miniconda3/envs/papercast/lib/python3.7/site-packages/spacy/util.py", line 355, in load_model_from_package
    return cls.load(vocab=vocab, disable=disable, exclude=exclude, config=config)
  File "/Users/gabe/opt/miniconda3/envs/papercast/lib/python3.7/site-packages/en_core_web_sm/__init__.py", line 12, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/Users/gabe/opt/miniconda3/envs/papercast/lib/python3.7/site-packages/spacy/util.py", line 520, in load_model_from_init_py
    config=config,
  File "/Users/gabe/opt/miniconda3/envs/papercast/lib/python3.7/site-packages/spacy/util.py", line 388, in load_model_from_path
    config = load_config(config_path, overrides=dict_to_dot(config))
  File "/Users/gabe/opt/miniconda3/envs/papercast/lib/python3.7/site-packages/spacy/util.py", line 545, in load_config
    raise IOError(Errors.E053.format(path=config_path, name="config.cfg"))
OSError: [E053] Could not read config.cfg from /Users/gabe/opt/miniconda3/envs/papercast/lib/python3.7/site-packages/en_core_web_sm/en_core_web_sm-2.0.0/config.cfg

After python -m spacy download en_core_web_sm, the import is successful.

Thanks for the report @g-simmons. I have to check if there is a way to pre-download during installation en_core_web_sm for spacy so that it doesn't give an error here.

It happens the same for me on jupyter notebook with python 3.8

@rokity can you run it after downloading spacy model?

python -m spacy download en_core_web_sm

@titipata Thanks for the answer and to resolve the issue.
Sorry for the late answer, I was busy at work I hope to join this project next weeks.
I'm using Python 3.8 on M1 2022 Macbook .
I think we can close the issue.

Thanks @rokity. I'll close this for now.