Andy Zhou's starred repositories
persuasive_jailbreaker
Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!
DecodingTrust
A Comprehensive Assessment of Trustworthiness in GPT Models
adversarial
Code and hyperparameters for the paper "Generative Adversarial Networks"
firebase-auth-swiftui-demo
FirebaseAuth + FirebaseFirestore + SwiftUI Demo
lm-evaluation-harness
A framework for few-shot evaluation of language models.
llm-attacks
Universal and Transferable Attacks on Aligned Language Models
lm-human-preferences
Code for the paper Fine-Tuning Language Models from Human Preferences
Self-Reminder
Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.
bypass-paywalls-chrome
Bypass Paywalls web browser extension for Chrome and Firefox.
DiscreteAdversarialDistillation
[NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"