An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks [ICML 2025]
Home Page:https://arxiv.org/abs/2410.16222
Repository from Github https://github.comvalentyn1boreiko/llm-threat-modelRepository from Github https://github.comvalentyn1boreiko/llm-threat-model