Giuseppe Massaro's starred repositories
PurpleLlama
Set of tools to assess and improve LLM security.
jailbreak_llms
[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).
awesome-llm-security
A curation of awesome tools, documents and projects about LLM Security.
TheBigPromptLibrary
A collection of prompts, system prompts and LLM instructions
CipherChat
A framework to evaluate the generalization capability of safety alignment for LLMs
EasyJailbreak
An easy-to-use Python framework to generate adversarial jailbreak prompts.
PromptInject
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
llm-security
Dropbox LLM Security research code and results
llm-adaptive-attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]
jailbreakbench
An Open Robustness Benchmark for Jailbreaking Language Models [arXiv 2024]
dspy-redteam
Red-Teaming Language Models with DSPy
curiosity_redteam
Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizXgXU)
llm-misinformation
The dataset and code for the paper "Can LLM-Generated Misinformation Be Detected?"
PrivacyBackdoor
Privacy backdoors