protectai / llm-guard

The Security Toolkit for LLM Interactions

Home Page:https://llm-guard.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Toxicity Scanner to return the type of content

RQledotai opened this issue · comments

When using the input or output toxicity scanner, it would be preferrable to return the type of label (e.g. sexual_explicit) instead of the offensive content. It would enable applications to communicate the issue.

Hey @RQledotai , thanks for reaching out. Apologies for the delay.

I agree, and such refactoring is in works to actually return an object with more context about the reason behind blocking. Currently, the only way to monitor is logs.