There are 3 repositories under trustworthy-ai topic.
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
🐢 Open-Source Evaluation & Testing for LLMs and ML models
The open-sourced Python toolbox for backdoor attacks and defenses.
[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
[NeurIPS-2023] Annual Conference on Neural Information Processing Systems
Code of the paper: A Recipe for Watermarking Diffusion Models
Neural Network Verification Software Tool
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
A comprehensive toolbox for model inversion attacks and defenses, which is easy to get started.
Official code repo for the O'Reilly Book - Machine Learning for High-Risk Applications
A toolkit for tools and techniques related to the privacy and compliance of AI models.
A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference time) and detection without classification accuracy drop, hyperparameter tuning, or collecting additional data.
The official implementation for ICLR23 paper "GNNSafe: Energy-based Out-of-Distribution Detection for Graph Neural Networks"
[ACM MM22] Towards Robust Video Object Segmentation with Adaptive Object Calibration, ACM Multimedia 2022
Principal Image Sections Mapping. Convolutional Neural Network Visualisation and Explanation Framework
A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in your project! Perform efficient inferences (i.e., do not increase inference time) without repetitive model training, hyperparameter tuning, or collecting additional data.
[USENIX Security 2025] PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models
A curated list of awesome academic research, books, code of ethics, data sets, institutes, newsletters, principles, podcasts, reports, tools, regulations and standards related to Responsible AI, Trustworthy AI, and Human-Centered AI.
Official code of "StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis" (CVPR 2022)
Code of the paper: Finetuning Text-to-Image Diffusion Models for Fairness
[TPAMI, 2023] Fear-Neuro-Inspired Reinforcement Learning for Safe Autonomous Driving
[AAAI 2024] The official repository for our paper, "OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated Examples"
[ICCV-2023] Gradient inversion attack, Federated learning, Generative adversarial network.
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
a tool for comparing the predictions of any text classifiers
Official code of "Discover and Mitigate Unknown Biases with Debiasing Alternate Networks" (ECCV 2022)
Official code of "Discover the Unknown Biased Attribute of an Image Classifier" (ICCV 2021)
Trustworthy AI method based on Dempster-Shafer theory - application to fetal brain 3D T2w MRI segmentation
[ICML 2024] Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts (Official Pytorch Implementation)