andyz245

followers

following

stars

www.andyzhou.ai

Andy Zhou's starred repositories

R-Judge

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

Language:Python5300

HarmBench

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Language:Jupyter NotebookMIT21600

persuasive_jailbreaker

Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!

Language:HTML21000

SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.

Language:PythonMIT1196900

fedselect

[CVPR 2024] Official Repository for "FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning"

700

OpenDevin

🐚 OpenDevin: Code Less, Make More

Language:PythonMIT2869000

OSWorld

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Language:PythonApache-2.0106000

CS421-My-Solutions

Language:OCaml100

reflexion

[NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning

Language:PythonMIT216300

DecodingTrust

A Comprehensive Assessment of Trustworthiness in GPT Models

Language:PythonCC-BY-SA-4.023100

SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?

Language:PythonMIT149100

wmdp

WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining general capabilities.

Language:Jupyter NotebookMIT5300

ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning

Language:PythonMIT189400

ArCHer

Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"

Language:Python7100

redteaming-resistance-benchmark

Language:Python2700

ReAct

[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models

Language:Jupyter NotebookMIT175800

lamorel

Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).

Language:PythonMIT17600

adversarial

Code and hyperparameters for the paper "Generative Adversarial Networks"

Language:PythonBSD-3-Clause381900

rpo

Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"

Language:Python3200

firebase-auth-swiftui-demo

FirebaseAuth + FirebaseFirestore + SwiftUI Demo

Language:Swift100

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Language:PythonMIT586300

CARE

Code for Certifiably Robust Learning with Reasoning via Variational Inference [IEEE SatML 2023]

Language:Python400

llm-attacks

Universal and Transferable Attacks on Aligned Language Models

Language:PythonMIT311400

message

Language:CSS35800

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences

Language:PythonMIT117000

Self-Reminder

Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.

Language:PythonGPL-3.03700

bypass-paywalls-chrome

Bypass Paywalls web browser extension for Chrome and Firefox.

Language:JavaScript4798600

AutoDAN

The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".

Language:Python17000

DiscreteAdversarialDistillation

[NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"

Language:PythonMIT1000

RISE

Domain Generalization through Distilling CLIP with Language Guidance

Language:Python2300