ChanLiang / awesome-ai-safety

📚 A curated list of papers & technical articles on AI Quality & Safety

https://giskard.ai

Awesome AI Safety

Figuring out how to make your AI safer? How to avoid ethical biases, errors, privacy leaks or robustness issues in your AI models?

This repository contains a curated list of papers & technical articles on AI Quality & Safety that should help 📚

Table of Contents

You can browse papers by Machine Learning task category, and use hashtags like #robustness to explore AI risk types.

General ML Testing
Tabular Machine Learning
Natural Language Processing
Computer Vision
Recommendation System
Time Series

General ML Testing

Machine learning testing: Survey, landscapes and horizons (Zhang et al., 2020) #General
Quality Assurance for AI-based Systems: Overview and Challenges (Felderer et al., 2021) #General
The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction (Breck et al., 2017) #General
Reliable Machine Learning: Applying SRE Principles to ML in Production [BOOK] (Chen et al., 2022) #Reliability
Metamorphic testing of decision support systems: A case study (Kuo et al., 2010) #Robustness
A Survey on Metamorphic Testing (Segura et al., 2016) #Robustness
Testing and validating machine learning classifiers by metamorphic testing (Xie et al., 2011) #Robustness
The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective (Krishna et al., 2022) #Explainability
InterpretML: A Unified Framework for Machine Learning Interpretability (Nori et al., 2019) #Explainability #General
Fair regression: Quantitative definitions and reduction-based algorithms (Agarwal et al., 2019) #Fairness
Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making (Aghaei et al., 2019) #Fairness
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning (Henderson et al., 2020) #Environment

AI Incident Databases

AI Incident Database (Responsible AI Collaborative)
AI Vulnerability Database (AVID)

Tabular Machine Learning

Machine Learning Model Drift Detection Via Weak Data Slices (Ackerman et al., 2021) #DataSlice #Debugging #Drift
Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach (Chung et al., 2020) #DataSlice
Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models (Krause et al., 2016) #Explainability

Natural Language Processing

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., 2020) #Robustness
Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization (Chen et al. 2023)#Robustness
Pipelines for Social Bias Testing of Large Language Models (Nozza et al., 2022) #Bias #Ethics
Why Should I Trust You?": Explaining the Predictions of Any Classifier (Ribeiro et al., 2016) #Explainability
A Unified Approach to Interpreting Model Predictions (Lundberg et al., 2017) #Explainability
Anchors: High-Precision Model-Agnostic Explanations (Ribeiro et al., 2018) #Explainability
Explanation-Based Human Debugging of NLP Models: A Survey (Lertvittayakumjorn, et al., 2021) #Debugging
SEAL: Interactive Tool for Systematic Error Analysis and Labeling (Rajani et al., 2022) #DataSlice #Explainability
Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science (Bender and Friedman, 2018) #Bias
Equalizing Gender Biases in Neural Machine Translation with Word Embeddings Techniques (Font and Costa-jussà, 2019) #Bias
On Measuring Social Biases in Sentence Encoders (May et al., 2019) #Bias
BBQ: A Hand-Built Bias Benchmark for Question Answering (Parrish et al., 2022) #Bias
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models (Van Aken et al., 2021) #Bias

Large Language Models

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators (Chen et al., 2023) #reliability
Holistic Evaluation of Language Models (Liang et al., 2022) #General
Learning to summarize from human feedback (Stiennon et al., 2020) #HumanFeedback
Identifying and Reducing Gender Bias in Word-Level Language Models (Bordia and Bowman, 2019) #Bias

Computer Vision

DOMINO: Discovering Systematic Errors with Cross-modal Embeddings Domino (Eyuboglu et al., 2022) #DataSlice
Explaining in Style: Training a GAN to explain a classifier in StyleSpace (Lang et al., 2022) #Robustness
Model Assertions for Debugging Machine Learning (Kang et al., 2018) #Debugging
Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure (Amini et al.) #Bias
Diversity in Faces (Merler et al.) #Fairness #Accuracy

Recommendation System

Beyond NDCG: behavioral testing of recommender systems with RecList (Chia et al., 2021) #Robustness

Time Series

Contributions are welcome 💕

About

📚 A curated list of papers & technical articles on AI Quality & Safety

https://giskard.ai

Apache License 2.0