AI Secure

AI Secure's repositories

DecodingTrust

A Comprehensive Assessment of Trustworthiness in GPT Models

Language:PythonCC-BY-SA-4.0305 5 25

AgentPoison

[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"

Language:PythonMIT163 2 9

Certified-Robustness-SoK-Oldver

This repo keeps track of popular provable training and verification approaches towards robust neural networks, including leaderboards on popular datasets and paper categorization.

98 120

VeriGauge

A united toolbox for running major robustness verification approaches for DNNs. [S&P 2023]

Language:C90 4 4

InfoBERT

[ICLR 2021] "InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective" by Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Language:Python85 2 6

RedCode

[NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Language:Python53 4 3

FLBenchmark-toolkit

Federated Learning Framework Benchmark (UniFed)

Language:PythonApache-2.049 3 5

aug-pe

[ICML 2024 Spotlight] Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Language:PythonApache-2.045 4 5

Robustness-Against-Backdoor-Attacks

RAB: Provable Robustness Against Backdoor Attacks

Language:Python39 3 2

semantic-randomized-smoothing

[CCS 2021] TSS: Transformation-specific smoothing for robustness certification

Language:Roff26 1 1

MMDT

Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models

Language:Jupyter Notebook24 3 1

UDora

[ICML 2025] UDora: A Unified Red Teaming Framework against LLM Agents

Language:Python18 1 1

FedGame

Official implementation for paper "FedGame: A Game-Theoretic Defense against Backdoor Attacks in Federated Learning" (NeurIPS 2023).

Language:PythonMIT13 2 1

TextGuard

TextGuard: Provable Defense against Backdoor Attacks on Text Classification

Language:Python13 30

[NeurIPS 2021] "Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models" by Boxin Wang*, Chejian Xu*, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li.

Language:Python12 10