AI Secure (AI-secure)

AI Secure

AI-secure

Organization data from Github https://github.com/AI-secure

UIUC Secure Learning Lab

Location:University of Illinois at Urbana-Champaign

Home Page:https://aisecure.github.io/

GitHub:@AI-secure

AI Secure's repositories

DecodingTrust

A Comprehensive Assessment of Trustworthiness in GPT Models

Language:PythonLicense:CC-BY-SA-4.0Stargazers:305Issues:5Issues:25

AgentPoison

[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"

Language:PythonLicense:MITStargazers:163Issues:2Issues:9

Certified-Robustness-SoK-Oldver

This repo keeps track of popular provable training and verification approaches towards robust neural networks, including leaderboards on popular datasets and paper categorization.

VeriGauge

A united toolbox for running major robustness verification approaches for DNNs. [S&P 2023]

InfoBERT

[ICLR 2021] "InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective" by Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

RedCode

[NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents

FLBenchmark-toolkit

Federated Learning Framework Benchmark (UniFed)

Language:PythonLicense:Apache-2.0Stargazers:49Issues:3Issues:5

aug-pe

[ICML 2024 Spotlight] Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Language:PythonLicense:Apache-2.0Stargazers:45Issues:4Issues:5

Robustness-Against-Backdoor-Attacks

RAB: Provable Robustness Against Backdoor Attacks

semantic-randomized-smoothing

[CCS 2021] TSS: Transformation-specific smoothing for robustness certification

MMDT

Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models

Language:Jupyter NotebookStargazers:24Issues:3Issues:1

UDora

[ICML 2025] UDora: A Unified Red Teaming Framework against LLM Agents

Language:Jupyter NotebookStargazers:15Issues:3Issues:3

FedGame

Official implementation for paper "FedGame: A Game-Theoretic Defense against Backdoor Attacks in Federated Learning" (NeurIPS 2023).

Language:PythonLicense:MITStargazers:13Issues:2Issues:1

TextGuard

TextGuard: Provable Defense against Backdoor Attacks on Text Classification

Language:PythonStargazers:13Issues:3Issues:0

adversarial-glue

[NeurIPS 2021] "Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models" by Boxin Wang*, Chejian Xu*, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li.

Language:PythonStargazers:12Issues:1Issues:0

SafeAuto

[ICML 2025] SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models

Language:PythonStargazers:12Issues:0Issues:3

CoPur

CoPur: Certifiably Robust Collaborative Inference via Feature Purification (NeurIPS 2022)

Language:PythonStargazers:10Issues:1Issues:0

CROP

[ICLR 2022] CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing

DPFL-Robustness

[CCS 2023] Unraveling the Connections between Privacy and Certified Robustness in Federated Learning Against Poisoning Attacks

Language:PythonStargazers:6Issues:3Issues:0

Certified-Fairness

[NeurIPS 2022] Code for Certifying Some Distributional Fairness with Subpopulation Decomposition

Language:PythonStargazers:5Issues:3Issues:0

SecretGen

A general model inversion attack against large pre-trained models.

Language:PythonLicense:MITStargazers:5Issues:1Issues:1

VFL-ADMM

Improving Privacy-Preserving Vertical Federated Learning by Efficient Communication with ADMM (SaTML 2024)

Language:PythonLicense:Apache-2.0Stargazers:4Issues:2Issues:0
Language:HTMLStargazers:1Issues:1Issues:0

helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110).

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:3Issues:0
Language:PythonStargazers:0Issues:2Issues:0

hf-blog

Public repo for HF blog posts

Language:Jupyter NotebookStargazers:0Issues:0Issues:0