Installation issue: "OSError: [E050] Can't find model 'en_spacy_pii_distilbert'."
tcpiplab opened this issue · comments
Describe the bug
There seems to be a problem with the installation instructions. I followed them but when attempting to run examples/openai.py
I received an error.
To Reproduce
Steps to reproduce the behavior:
- Follow the installation steps from README.md.
- Set the env var for your openai API key.
- Run
python examples/openai.py
. - See error.
Expected behavior
I expected the example script to run successfully
Error Output
$ python examples/openai.py
Traceback (most recent call last):
File "/Users/tcpiplab/Tools/llm-guard/examples/openai.py", line 9, in <module>
import openai
File "/Users/tcpiplab/Tools/llm-guard/examples/openai.py", line 18, in <module>
input_scanners = [Anonymize(vault), Toxicity(), TokenLimit(), PromptInjection()]
^^^^^^^^^^^^^^^^
File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/llm_guard/input_scanners/anonymize.py", line 94, in __init__
self._analyzer = get_analyzer(
^^^^^^^^^^^^^
File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/llm_guard/input_scanners/anonymize_helpers/analyzer.py", line 64, in get
nlp_engine = _get_nlp_engine(recognizer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/llm_guard/input_scanners/anonymize_helpers/analyzer.py", line 60, in _get_nlp_engine
return provider.create_engine()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/presidio_analyzer/nlp_engine/nlp_engine_provider.py", line 91, in create_engine
engine = nlp_engine_class(nlp_engine_opts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/presidio_analyzer/nlp_engine/spacy_nlp_engine.py", line 36, in __init__
self.nlp = {
^
File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/presidio_analyzer/nlp_engine/spacy_nlp_engine.py", line 37, in <dictcomp>
lang_code: spacy.load(model_name, disable=["parser"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/spacy/__init__.py", line 54, in load
return util.load_model(
^^^^^^^^^^^^^^^^
File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/spacy/util.py", line 439, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_spacy_pii_distilbert'. It doesn't seem to be a Python package or a valid path to a data directory.
Additional context
- Mac OS Big Sur 11.7.10
Darwin Kernel Version 20.6.0: Thu Jul 6 22:12:47 PDT 2023; root:xnu-7195.141.49.702.12~1/RELEASE_X86_64 x86_64
- Python 3.11.6
Hey @tcpiplab ,
Thanks for reaching out!
We changed the way models work and now you'd need to install models of your choice.
https://llm-guard.com/input_scanners/anonymize/#get-started - documentation
You can run this to use the default model en_spacy_pii_distilbert
# en_spacy_pii_distilbert (default)
pip install https://huggingface.co/beki/en_spacy_pii_distilbert/resolve/main/en_spacy_pii_distilbert-any-py3-none-any.whl