leondz / autoredteam

autoredteam: code for training models that automatically red team other language models

Home Page:https://interhumanagreement.substack.com/p/faketoxicityprompts-automatic-red

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

leondz/autoredteam Issues