protectai / llm-guard

The Security Toolkit for LLM Interactions

Home Page:https://llm-guard.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Installation issue: "OSError: [E050] Can't find model 'en_spacy_pii_distilbert'."

tcpiplab opened this issue · comments

Describe the bug
There seems to be a problem with the installation instructions. I followed them but when attempting to run examples/openai.py I received an error.

To Reproduce
Steps to reproduce the behavior:

  1. Follow the installation steps from README.md.
  2. Set the env var for your openai API key.
  3. Run python examples/openai.py.
  4. See error.

Expected behavior
I expected the example script to run successfully

Error Output

$ python examples/openai.py
Traceback (most recent call last):
  File "/Users/tcpiplab/Tools/llm-guard/examples/openai.py", line 9, in <module>
    import openai
  File "/Users/tcpiplab/Tools/llm-guard/examples/openai.py", line 18, in <module>
    input_scanners = [Anonymize(vault), Toxicity(), TokenLimit(), PromptInjection()]
                      ^^^^^^^^^^^^^^^^
  File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/llm_guard/input_scanners/anonymize.py", line 94, in __init__
    self._analyzer = get_analyzer(
                     ^^^^^^^^^^^^^
  File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/llm_guard/input_scanners/anonymize_helpers/analyzer.py", line 64, in get
    nlp_engine = _get_nlp_engine(recognizer)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/llm_guard/input_scanners/anonymize_helpers/analyzer.py", line 60, in _get_nlp_engine
    return provider.create_engine()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/presidio_analyzer/nlp_engine/nlp_engine_provider.py", line 91, in create_engine
    engine = nlp_engine_class(nlp_engine_opts)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/presidio_analyzer/nlp_engine/spacy_nlp_engine.py", line 36, in __init__
    self.nlp = {
               ^
  File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/presidio_analyzer/nlp_engine/spacy_nlp_engine.py", line 37, in <dictcomp>
    lang_code: spacy.load(model_name, disable=["parser"])
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/spacy/__init__.py", line 54, in load
    return util.load_model(
           ^^^^^^^^^^^^^^^^
  File "/Users/tcpiplab/Tools/llm-guard/venv/lib/python3.11/site-packages/spacy/util.py", line 439, in load_model
    raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_spacy_pii_distilbert'. It doesn't seem to be a Python package or a valid path to a data directory.

Additional context

  • Mac OS Big Sur 11.7.10
  • Darwin Kernel Version 20.6.0: Thu Jul 6 22:12:47 PDT 2023; root:xnu-7195.141.49.702.12~1/RELEASE_X86_64 x86_64
  • Python 3.11.6

Hey @tcpiplab ,
Thanks for reaching out!

We changed the way models work and now you'd need to install models of your choice.

https://llm-guard.com/input_scanners/anonymize/#get-started - documentation

You can run this to use the default model en_spacy_pii_distilbert

# en_spacy_pii_distilbert (default)
pip install https://huggingface.co/beki/en_spacy_pii_distilbert/resolve/main/en_spacy_pii_distilbert-any-py3-none-any.whl