Figuring out how to make your AI safer? How to avoid ethical biases, errors, privacy leaks or robustness issues in your AI models?
This repository contains a curated list of papers & technical articles on AI Quality & Safety that should help 📚
You can browse papers by Machine Learning task category, and use hashtags like #robustness
to explore AI risk types.
- General ML Testing
- Tabular Machine Learning
- Natural Language Processing
- Computer Vision
- Recommendation System
- Time Series
- Machine learning testing: Survey, landscapes and horizons (Zhang et al., 2020)
#General
- Quality Assurance for AI-based Systems: Overview and Challenges (Felderer et al., 2021)
#General
- The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction (Breck et al., 2017)
#General
- Reliable Machine Learning: Applying SRE Principles to ML in Production [BOOK] (Chen et al., 2022)
#Reliability
- Metamorphic testing of decision support systems: A case study (Kuo et al., 2010)
#Robustness
- A Survey on Metamorphic Testing (Segura et al., 2016)
#Robustness
- Testing and validating machine learning classifiers by metamorphic testing (Xie et al., 2011)
#Robustness
- The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective (Krishna et al., 2022)
#Explainability
- InterpretML: A Unified Framework for Machine Learning Interpretability (Nori et al., 2019)
#Explainability
#General
- Fair regression: Quantitative definitions and reduction-based algorithms (Agarwal et al., 2019)
#Fairness
- Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making (Aghaei et al., 2019)
#Fairness
- Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning (Henderson et al., 2020)
#Environment
- AI Incident Database (Responsible AI Collaborative)
- AI Vulnerability Database (AVID)
- Machine Learning Model Drift Detection Via Weak Data Slices (Ackerman et al., 2021)
#DataSlice
#Debugging
#Drift
- Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach (Chung et al., 2020)
#DataSlice
- Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models (Krause et al., 2016)
#Explainability
- Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., 2020)
#Robustness
- Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization (Chen et al. 2023)
#Robustness
- Pipelines for Social Bias Testing of Large Language Models (Nozza et al., 2022)
#Bias
#Ethics
- Why Should I Trust You?": Explaining the Predictions of Any Classifier (Ribeiro et al., 2016)
#Explainability
- A Unified Approach to Interpreting Model Predictions (Lundberg et al., 2017)
#Explainability
- Anchors: High-Precision Model-Agnostic Explanations (Ribeiro et al., 2018)
#Explainability
- Explanation-Based Human Debugging of NLP Models: A Survey (Lertvittayakumjorn, et al., 2021)
#Debugging
- SEAL: Interactive Tool for Systematic Error Analysis and Labeling (Rajani et al., 2022)
#DataSlice
#Explainability
- Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science (Bender and Friedman, 2018)
#Bias
- Equalizing Gender Biases in Neural Machine Translation with Word Embeddings Techniques (Font and Costa-jussà , 2019)
#Bias
- On Measuring Social Biases in Sentence Encoders (May et al., 2019)
#Bias
- BBQ: A Hand-Built Bias Benchmark for Question Answering (Parrish et al., 2022)
#Bias
- What Do You See in this Patient? Behavioral Testing of Clinical NLP Models (Van Aken et al., 2021)
#Bias
- Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators (Chen et al., 2023)
#reliability
- Holistic Evaluation of Language Models (Liang et al., 2022)
#General
- Learning to summarize from human feedback (Stiennon et al., 2020)
#HumanFeedback
- Identifying and Reducing Gender Bias in Word-Level Language Models (Bordia and Bowman, 2019)
#Bias
- DOMINO: Discovering Systematic Errors with Cross-modal Embeddings Domino (Eyuboglu et al., 2022)
#DataSlice
- Explaining in Style: Training a GAN to explain a classifier in StyleSpace (Lang et al., 2022)
#Robustness
- Model Assertions for Debugging Machine Learning (Kang et al., 2018)
#Debugging
- Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure (Amini et al.)
#Bias
- Diversity in Faces (Merler et al.)
#Fairness
#Accuracy
- Beyond NDCG: behavioral testing of recommender systems with RecList (Chia et al., 2021)
#Robustness