trustworthy-ai

There are 3 repositories under trustworthy-ai topic.

Trusted-AI / adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
python attack adversarial-machine-learning poisoning trusted-ai artificial-intelligence extraction adversarial-attacks adversarial-examples evasion inference privacy ai trustworthy-ai red-team blue-team machine-learning
Language:Python 4600
giskard
Giskard-AI / giskard
🐢 Open-Source Evaluation & Testing for LLMs and ML models
mlops ml-validation ml-testing ai-testing ai-safety ml-safety llmops ethical-artificial-intelligence responsible-ai fairness-ai trustworthy-ai llm-eval llm-evaluation rag-evaluation ai-security llm-security ai-red-team red-team-tools llm evaluation-framework
Language:Python 3690
zjunlp / EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
artificial-intelligence efficient knowledge-editing large-language-models model-editing natural-language-processing open-source-project tool easyedit knowlm gpt llama llama2 baichuan chatgpt mmedit unlearning trustworthy-ai mistral safeedit
Language:Jupyter Notebook 1589
THUYimingLi / BackdoorBox
The open-sourced Python toolbox for backdoor attacks and defenses.
backdoor-attacks backdoor-learning trustworthy-ai trustworthy-machine-learning backdoor-defenses
Language:Python 393
HowieHwong / TrustLLM
[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
ai benchmark dataset evaluation large-language-models llm natural-language-processing nlp pypi-package toolkit trustworthy-ai trustworthy-machine-learning
Language:Python 356
yunqing-me / AttackVLM
[NeurIPS-2023] Annual Conference on Neural Information Processing Systems
adversarial-attack deep-generative-model foundation-models generative-ai image-to-text-generation large-language-models text-to-image-generation trustworthy-ai vision-language-model
Language:Python 135
liuzuxin / FSRL
🚀 A fast safe reinforcement learning library in PyTorch
reinforcement-learning safe-rl library pytorch safety-critical decision-making robotics trustworthy-ai cpo ppo sac trpo cvpo
Language:Python 133
yunqing-me / WatermarkDM
Code of the paper: A Recipe for Watermarking Diffusion Models
diffusion-models generative-models text-to-image trustworthy-ai watermark
Language:Jupyter Notebook 112
verivital / nnv
Neural Network Verification Software Tool
neural-network verification autonomy cyber-physical cyber-physical-systems formal-methods formal-verification hybrid-systems neural-network-verification reachability reachability-analysis robustness-verification trustworthy-ai trustworthy-machine-learning assured-autonomy neural-network-certification safe-ai safe-autonomy
Language:MATLAB 104
aiverify-foundation / moonshot
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
benchmarking evaluation-framework llm red-teaming trustworthy-ai
Language:Python 102
ffhibnese / Model-Inversion-Attack-ToolBox
A comprehensive toolbox for model inversion attacks and defenses, which is easy to get started.
benchmarks toolbox trustworthy-ai machine-learning privacy model-inversion model-inversion-attacks
Language:Python 97
aiverify-foundation / aiverify
AI Verify
trustworthy-ai
Language:Python 95
Machine-Learning-for-High-Risk-Applications-Book
ml-for-high-risk-apps-book / Machine-Learning-for-High-Risk-Applications-Book
Official code repo for the O'Reilly Book - Machine Learning for High-Risk Applications
deep-learning explainable-ai interpretable-machine-learning machine-learning oreilly oreilly-books responsible-ai security trustworthy-ai
Language:Jupyter Notebook 92
IBM / ai-privacy-toolkit
A toolkit for tools and techniques related to the privacy and compliance of AI models.
privacy anonymization ai-models machine-learning trustworthy-ai python ai artificial-intelligence ml mlops gdpr
Language:Python 91
dlmacedo / entropic-out-of-distribution-detection
A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference time) and detection without classification accuracy drop, hyperparameter tuning, or collecting additional data.
pytorch deep-learning out-of-distribution-detection out-of-distribution machine-learning trustworthy-ai ai-safety anomaly-detection novelty-detection robust-machine-learning trustworthy-machine-learning ood ood-detection osr open-set-recognition open-set
Language:Python 75
qitianwu / GraphOOD-GNNSafe
The official implementation for ICLR23 paper "GNNSafe: Energy-based Out-of-Distribution Detection for Graph Neural Networks"
anamoly-detection artificial-intelligence deep-learning distribution-shift geometric-deep-learning graph-neural-networks label-propagation large-graph node-classification out-of-distribution-detection outlier-detection pytorch pytorch-geometric trustworthy-ai out-of-distribution-generalization
Language:Python 69
ai4ce / FLAT
[ICCV2021 Oral] Fooling LiDAR by Attacking GPS Trajectory
deep-learning point-cloud lidar adversarial-attacks 3d-object-detection ai-safety trustworthy-ai trustworthy-machine-learning 3d-perception robotics autonomous-driving gnss
Language:Python 67
JerryX1110 / Robust-Video-Object-Segmentation
[ACM MM22] Towards Robust Video Object Segmentation with Adaptive Object Calibration, ACM Multimedia 2022
robust robustness video video-object-segmentation video-segmentation vos acm-mm acm-multimedia acm-multimedia-2022 segmentation trustworthy-ai clustering denosing k-means acm acm-mm-22 pytorch tracking robust-tracking
Language:Python 48
TorchPRISM
szandala / TorchPRISM
Principal Image Sections Mapping. Convolutional Neural Network Visualisation and Explanation Framework
xai pytorch xai-library explainable-ai trustworthy-ai explainable-machine-learning neural-networks
Language:Python 47
dlmacedo / distinction-maximization-loss
A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in your project! Perform efficient inferences (i.e., do not increase inference time) without repetitive model training, hyperparameter tuning, or collecting additional data.
classification deep-learning machine-learning open-set-recognition out-of-distribution-detection pytorch robust-machine-learning trustworthy-ai trustworthy-machine-learning uncertainty-estimation ai-safety anomaly-detection novelty-detection ood ood-detection open-set osr out-of-distribution
Language:Python 45
sleeepeer / PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models
ai machine-learning rag retrieval-augmented-generation security trustworthy-ai
Language:Python 42
AthenaCore / AwesomeResponsibleAI
A curated list of awesome academic research, books, code of ethics, data sets, institutes, newsletters, principles, podcasts, reports, tools, regulations and standards related to Responsible AI, Trustworthy AI, and Human-Centered AI.
responsible-ai xai fairness-ai awesome-list ethical-ai explainable-ai interpretable-ai trustworthy-ai
40
95616ARG / SyReNN
SyReNN: Symbolic Representations for Neural Networks
deep-neural-networks integrated-gradients network-patching neural-network trustworthy-ai trustworthy-machine-learning verification
Language:Python 39
zhihengli-UR / StyleT2I
Official code of "StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis" (CVPR 2022)
text-to-image text-to-image-synthesis ai-fairness responsible-ai stylegan trustworthy-ai
Language:Python 37
sail-sg / finetune-fair-diffusion
Code of the paper: Finetuning Text-to-Image Diffusion Models for Fairness
diffusion-models fairness text-to-image trustworthy-ai
Language:Python 32
zRapha / FAME
Framework for Adversarial Malware Evaluation.
malware adversarial-examples genetic-programming evasion reinforcement-learning machine-learning adversarial-attacks adversarial-machine-learning trustworthy-ai trustworthy-machine-learning
Language:Python 32
TMIS-Turbo / FNI-RL
[TPAMI, 2023] Fear-Neuro-Inspired Reinforcement Learning for Safe Autonomous Driving
autonomous-driving reinforcement-learning trustworthy-ai safe-decision-making neuro-ai
Language:Python 29
ryuryukke / OUTFOX
[AAAI 2024] The official repository for our paper, "OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated Examples"
aaai2024 adversarial-learning deepfakes detection in-context-learning llms robustness trustworthy-ai
Language:Python 28
ffhibnese / GIFD_Gradient_Inversion_Attack
[ICCV-2023] Gradient inversion attack, Federated learning, Generative adversarial network.
federated-learning deep-gradient-leakage gradient-inversion-attack trustworthy-ai
Language:Python 27
richard-peng-xia / CARES
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
large-vision-language-model vision-language-model medical-multimodal-learning trustworthy-ai
Language:Python 26
Crisp-Unimib / ContrXT
a tool for comparing the predictions of any text classifiers
machine-learning text-classification natural-language-processing nlp xai xai-library trustworthy-machine-learning trustworthy-ai data-science interpretable-machine-learning text-classification-python explainable-ml human-computer-interaction datascience-machinelearning code-quality data-visualization natural-language
Language:Python 24
zhihengli-UR / DebiAN
Official code of "Discover and Mitigate Unknown Biases with Debiasing Alternate Networks" (ECCV 2022)
bias-detection bias-mitigation computer-vision responsible-ai trustworthy-ai
Language:Python 23
yuji-roh / fairbatch
FairBatch: Batch Selection for Model Fairness (ICLR 2021)
machine-learning pytorch deep-learning fairness fairness-ml fairness-ai trustworthy-ai responsible-ai tensorflow2
Language:Python 20
zhihengli-UR / discover_unknown_biases
Official code of "Discover the Unknown Biased Attribute of an Image Classifier" (ICCV 2021)
disentanglement ai-fairness bias-detection stylegan responsible-ai trustworthy-ai
Language:Python 19
LucasFidon / trustworthy-ai-fetal-brain-segmentation
Trustworthy AI method based on Dempster-Shafer theory - application to fetal brain 3D T2w MRI segmentation
deep-learning fetal-mri segmentation trustworthy-ai trustworthy-machine-learning
Language:Python 18
joycenerd / P4D
[ICML 2024] Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts (Official Pytorch Implementation)
diffusion-models prompt-tuning red-teaming t2i trustworthy-ai
Language:Python 17