keithrozario / Klayers

Python Packages as AWS Lambda Layers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG]not able to load model en_core_web_sm

pooldiver69 opened this issue · comments

Describe the bug
A clear and concise description of what the bug is.

    nlp = spacy.load("/opt/en_core_web_sm-2.2.5/en_core_web_sm/en_core_web_sm-2.2.5")
    doc = nlp("Hello World from spaCy")
    return {
        'statusCode': 200,
        'body': json.dumps(doc.text)
    }

while using spacy loading en_core_web_sm, it shows failed to load model.

{
  "errorMessage": "[E053] Could not read config.cfg from /opt/en_core_web_sm-2.2.5/en_core_web_sm/en_core_web_sm-2.2.5/config.cfg",
  "errorType": "OSError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 4, in lambda_handler\n    nlp = spacy.load(\"/opt/en_core_web_sm-2.2.5/en_core_web_sm/en_core_web_sm-2.2.5\")\n",
    "  File \"/opt/python/spacy/__init__.py\", line 50, in load\n    return util.load_model(\n",
    "  File \"/opt/python/spacy/util.py\", line 326, in load_model\n    return load_model_from_path(Path(name), **kwargs)\n",
    "  File \"/opt/python/spacy/util.py\", line 390, in load_model_from_path\n    config = load_config(config_path, overrides=dict_to_dot(config))\n",
    "  File \"/opt/python/spacy/util.py\", line 547, in load_config\n    raise IOError(Errors.E053.format(path=config_path, name=\"config.cfg\"))\n"
  ]
}

also warning about the version too

/opt/python/spacy/util.py:717: UserWarning: [W094] Model 'en_core_web_sm' (2.2.5) specifies an under-constrained spaCy version requirement: >=2.2.2. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.0.6,<3.1.0
  warnings.warn(warn_msg)

Screenshots
If applicable, add screenshots to help explain your problem.

Layer Version ARN:
Specific Layer ARN that didn't work (let's us know which version, region etc).
arn:aws:lambda:us-west-2:770693421928:layer:Klayers-python38-spacy:42
arn:aws:lambda:us-west-2:770693421928:layer:Klayers-python38-spacy_model_en_small:1

Framework:
What Framework do you use (Serverless, Architect, Zappa, or just Console?)
just Console

Additional context
Add any other context about the problem here.
try both platform python verion 3.7 and 3.8, both does not works

@pooldiver69 Not sure if this is still an issue for you but I was getting the same response until I updated the layer version of the spacy ARN to v41. I realize that isn't a good long-term solution but it seems to be working for now.

Looks like this only works with version41. I've extended the expiry for that layer to next year, so folks can continue using this till we have a better solution :)

arn:aws:lambda:us-west-2:770693421928:layer:Klayers-python38-spacy:41

commented

Hi @keithrozario love Klayers. This is still an issue with v42 of the Spacy layer. Is there any chance you can extend the expiry of v41 another year... Again? I'm looking into other potential solutions in the meantime.

I'm having the same experience, extending the expiry would be great. And is there any way we can help with releasing those newer versions?

In the mean time, here is what I did to deal with this myself. Perhaps this is useful to people that have similar needs:

mkdir -p zip/python
python3 -m venv zip/python/venv
cd zip/python
 ./venv/bin/pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz --no-deps --target .
cd ..
zip -r ./../en_core_web_sm-3.0.0.zip . -x "python/venv/*" "*__pycache__*"
cd ..
aws lambda publish-layer-version \              
--region eu-central-1 \
--layer-name spacy-model-en-core-web-sm \
--zip-file fileb:///path/to/en_core_web_sm-3.0.0.zip

Make sure to change the absolute path to the zip and the region in the publish-layer-version command.
The ARN should work in combination with the layer from Layers 😃

After creating the layer by yourself, how do u use it in lambda? I tried
nlp = spacy.load("opt/en_core_web_sm/en_core_web_sm-3.0.0"). It didn't work.

Simply nlp = spacy.load("en_core_web_sm") worked. Since the zip file exceeds 10MB, i had to create the lambda layer after uploading file to S3.