Request for support of German language in prompt injection detection model

Question

Request for support of German language in prompt injection detection model

ulan-yisaev opened this issue 2 months ago · comments

Hello,

We have encountered an issue with the LLM-Guard scanner where it falsely detects German prompts as injections. Our use case requires handling German language inputs accurately. Currently, the model ProtectAI/deberta-v3-base-prompt-injection-v2 does not support German, leading to false positives.

Example:

[warning] Detected prompt injection injection_score=0.99
[info] Scanned prompt elapsed_time_seconds=0.025865 scores={'PromptInjection': 1.0}
[scan_prompt] ERROR Prompt 'wo kann ich mich gegen info impfen lassen' is not valid, scores: {'PromptInjection': 1.0}

Details:

LLM-Guard scanner: Prompt Injection - LLM Guard
Current model: ProtectAI/deberta-v3-base-prompt-injection-v2
Alternative model: deepset/deberta-v3-base-injection (seems like supports German language)

Configuration:
We currently cannot configure the language of the prompt directly:

input_scanners = [
    PromptInjection(threshold=settings.inp_prompt_inj_thres, match_type=MatchType.SENTENCE, use_onnx=True),
    TokenLimit(limit=settings.prompt_token_limit),
    Toxicity(threshold=settings.inp_toxic_thres, use_onnx=True),
    InvisibleText(),
]

Request:
Is it possible to either:

Use another model that supports the German language, such as deepset/deberta-v3-base-injection, or
Provide a configuration option to specify the language for the prompt injection scanner.

Thank you for your support!

Ulan Yisaev · Answer 1 · Sat Jun 15 2024 19:32:10 GMT+0800 (China Standard Time)

I have replaced the current model with deepset/deberta-v3-base-injection, but for now with the regular version which is not ONNX. Let's see at the results during testing.
I tested these two models with our problematic prompt, it seems that the model from Deepset really understands German better:

Oleksandr Yaremchuk · Answer 2 · Mon Jun 17 2024 15:10:36 GMT+0800 (China Standard Time)

Hey @ulan-yisaev , thanks for reaching out.

Our model doesn't support non-English prompts, unfortunately. Even "deepset/deberta-v3-base-injection", although trained on some German prompts won't do it well because it is trained on top of deberta-v3-base which is an English model.

On a side note, ONNX version is the same as the normal version. The difference is only how it executes to optimize for inference.