Andy Zhou's starred repositories

R-Judge

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

Language:PythonStargazers:53Issues:0Issues:0

HarmBench

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Language:Jupyter NotebookLicense:MITStargazers:216Issues:0Issues:0

persuasive_jailbreaker

Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!

Language:HTMLStargazers:210Issues:0Issues:0

SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.

Language:PythonLicense:MITStargazers:11969Issues:0Issues:0

fedselect

[CVPR 2024] Official Repository for "FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning"

Stargazers:7Issues:0Issues:0

OpenDevin

🐚 OpenDevin: Code Less, Make More

Language:PythonLicense:MITStargazers:28690Issues:0Issues:0

OSWorld

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Language:PythonLicense:Apache-2.0Stargazers:1060Issues:0Issues:0
Language:OCamlStargazers:1Issues:0Issues:0

reflexion

[NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning

Language:PythonLicense:MITStargazers:2163Issues:0Issues:0

DecodingTrust

A Comprehensive Assessment of Trustworthiness in GPT Models

Language:PythonLicense:CC-BY-SA-4.0Stargazers:231Issues:0Issues:0

SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?

Language:PythonLicense:MITStargazers:1491Issues:0Issues:0

wmdp

WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining general capabilities.

Language:Jupyter NotebookLicense:MITStargazers:53Issues:0Issues:0

ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning

Language:PythonLicense:MITStargazers:1894Issues:0Issues:0

ArCHer

Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"

Language:PythonStargazers:71Issues:0Issues:0
Language:PythonStargazers:27Issues:0Issues:0

ReAct

[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models

Language:Jupyter NotebookLicense:MITStargazers:1758Issues:0Issues:0

lamorel

Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).

Language:PythonLicense:MITStargazers:176Issues:0Issues:0

adversarial

Code and hyperparameters for the paper "Generative Adversarial Networks"

Language:PythonLicense:BSD-3-ClauseStargazers:3819Issues:0Issues:0

rpo

Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"

Language:PythonStargazers:32Issues:0Issues:0

firebase-auth-swiftui-demo

FirebaseAuth + FirebaseFirestore + SwiftUI Demo

Language:SwiftStargazers:1Issues:0Issues:0

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Language:PythonLicense:MITStargazers:5863Issues:0Issues:0

CARE

Code for Certifiably Robust Learning with Reasoning via Variational Inference [IEEE SatML 2023]

Language:PythonStargazers:4Issues:0Issues:0

llm-attacks

Universal and Transferable Attacks on Aligned Language Models

Language:PythonLicense:MITStargazers:3114Issues:0Issues:0
Language:CSSStargazers:358Issues:0Issues:0

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences

Language:PythonLicense:MITStargazers:1170Issues:0Issues:0

Self-Reminder

Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.

Language:PythonLicense:GPL-3.0Stargazers:37Issues:0Issues:0

bypass-paywalls-chrome

Bypass Paywalls web browser extension for Chrome and Firefox.

Language:JavaScriptStargazers:47986Issues:0Issues:0

AutoDAN

The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".

Language:PythonStargazers:170Issues:0Issues:0

DiscreteAdversarialDistillation

[NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"

Language:PythonLicense:MITStargazers:10Issues:0Issues:0

RISE

Domain Generalization through Distilling CLIP with Language Guidance

Language:PythonStargazers:23Issues:0Issues:0