A continually-updated list of studies from the CSCW 2021 paper, "Problematic Machine Behavior: A Systematic Literature Review of Algorithm Audits"
Repository is a work-in-progress, open to the community for edits.
To add a study, simply create an issue or pull request.
Algorithm Audit: an empirical study investigating a public algorithmic system for potential problematic behavior.
- empirical study: includes an experiment or analysis (quantitative or qualitative) that generates evidence-based claims with well-defined outcome metrics. It must not be purely an opinion/position paper, although position papers with substantial empirical components were included
- algorithmic system: is any socio-technical system influenced by at least one algorithm. This includes systems that may rely on human judgement and/or other non-algorithmic components, as long as they include at least one algorithm.
- public: algorithmic system is one used in a commercial context or other public setting such as law enforcement, education, criminal punishment, or public transportation
- problematic behavior: in this study refers to discrimination, distortion, exploitation, or mis- judgement, as well as various types of behaviors within each of these categories. A behavior is problematic when it causes harm (or potential harm). In the ACM Code of Ethics, examples of harm include "unjustified physical or mental injury, unjustified destruction or disclosure of information, and unjustified damage to property, reputation, and the environment."
The algorithm disparately treats or disparately impacts people on the basis of their race, age, gender, location, socioeconomic status, and/or intersectional identity. For example, an algorithm implicated in discrimination may systematically favor people who identify as males, or reinforce harmful stereotypes about elderly people.
- Detecting price and search discrimination on the internet (Mikians et al., 2012)
- Crowd-assisted search for price discrimination in E-commerce: First results (Mikians et al., 2013)
- Measuring Price Discrimination and Steering on E-commerce Web Sites (Hannak et al., 2014)
- An Empirical Analysis of Algorithmic Pricing on Amazon Marketplace (Chen et al., 2016)
- An Empirical Study on Online Price Differentiation (Hupperich et al., 2018)
- Discrimination in Online Ad Delivery (Sweeney, 2013)
- Algorithmic bias? An empirical study of apparent gender-based discrimination in the display of stem career ads (Lambrecht and Tucker, 2019)
- Auditing Race and Gender Discrimination in Online Housing Market (Asplund et al., 2020)
- Auditing for Discrimination in Algorithms Delivering Job Ads (Imana et al., 2021)
- Google search: Hyper-visibility as a means of rendering black women and girls invisible (Noble, 2013)
- Unequal Representation and Gender Stereotypes in Image Search Results for Occupations (Kay et al., 2015)
- Bias in Online freelance marketplaces: Evidence from TaskRabbit and Fiverr (Hannak et al., 2017)
- Investigating the impact of gender on rank in resume search engines (Chen et al., 2018)
- Fairness-aware ranking in search & recommendation systems with application to linkedin talent search (Geyik, Ambler, and Kenthapadi, 2019)
- Tracking gendered streams (Eriksson and Johansson, 2017)
- Building and Auditing Fair Algorithms: A Case Study in Candidate Screening (Wilson et al., 2021)
- Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification (Buolamwini and Gebru, 2018)
- Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products (Raji and Buolamwini, 2019)
- Does Object Recognition Work for Everyone? (DeVries et al., 2019)
- Fairness in proprietary image tagging algorithms: A cross-platform audit on people images (Kyriakou et al., 2019)
- Social B(eye)as: Human and machine descriptions of people images (Barlas et al., 2019)
- Why machine learning may lead to unfairness: Evidence from risk assessment for juvenile justice in Catalonia (Tolan et al., 2019)
- The Risk of Racial Bias in Hate Speech Detection (Sap et al., 2020)
The algorithm presents media that distorts or obscures an underlying reality. For example, an algorithm implicated in distortion may favor content from a given political perspective, hyper-personalize output for different users, change its output frequently and without good reason, or provide misleading information to users.
- Measuring Personalization of Web Search (Hannak et al., 2013)
- Location, Location, Location: The Impact of Geolocation on Web Search Personalization (Kliman-Silver et al., 2015)
- "Be careful; Things can be worse than they appear" - Understanding biased algorithms and users' behavior around them in rating platforms (Eslami et al., 2017)
- Quantifying search bias: Investigating sources of bias for political searches in social media (Kulshrestha et al., 2017)
- From ranking algorithms to ‘ranking cultures’: Investigating the modulation of visibility in YouTube search results (Rieder, Matamoros-Fernández, and Coromina, 2018)
- Challenging Google Search filter bubbles in social and political information: Disconforming evidence from a digital methods case study (Courtois, Slechten, and Coenen, 2018)
- Search bias quantification: investigating political bias in social media and web search (Kulshrestha et al., 2018)
- Auditing partisan audience bias within Google search (Robertson et al., 2018)
- Beyond the bubble: Assessing the diversity of political search results (Puschmann, 2018)
- You Can’t See What You Can’t See: Experimental Evidence for How Much Relevant Information May Be Missed Due to Google’s Web Search Personalisation (Lai and Luczak-Roesch, 2019)
- Search media and elections: A longitudinal investigation of political search results in the 2018 U.S. Elections (Metaxa et al., 2019)
- Search as news curator: The role of Google in shaping attention to news information (Trielli and Diakopoulos, 2019)
- Dr. Google, what can you tell me about homeopathy? Comparative study of the top10 websites in the United States, United Kingdom, France, Mexico and Spain (Cano-Orón, 2019)
- Comparing Platform “Ranking Cultures” Across Languages: The Case of Islam on YouTube in Scandinavia (Moe, 2019)
- Auditing the partisanship of Google search snippets (Hu et al., 2019)
- Auditing autocomplete: Suggestion networks and recursive algorithm interrogation (Robertson et al., 2019)
- Auditing local news presence on Google News (Fischer et al., 2020)
- Measuring Misinformation in Video Search Platforms: An Audit Study on YouTube (Hussein et al., 2020)
- The Media Coverage of the 2020 US Presidential Election Candidates through the Lens of Google’s Top Stories (Kawakami et al., 2020)
- Auditing Source Diversity Bias in Video Search Results Using Virtual Agents (Urman et al., 2021)
- MapWatch: Detecting and monitoring international border personalization on online maps (Soeller et al., 2016)
- More of the same - On spotify radio (Snickars, 2017)
- Coding the News: The role of computer code in filtering and distributing news (Weber and Kosterich, 2018)
- Are We Exposed to the Same “News” in the News Feed?: An empirical analysis of filter bubbles as information similarity for Danish Facebook users (Bechmann and Nielbo, 2018)
- Analyzing the news coverage of personalized newspapers (Chakraborty and Ganguly, 2018)
- Opening Up the Black Box: Auditing Google’s Top Stories Algorithm (Lurie and Mustafaraj, 2019)
- Auditing radicalization pathways on YouTube (Ribeiro et al., 2020)
- Auditing News Curation Systems: A Case Study Examining Algorithmic and Editorial Logic in Apple News (Bandy and Diakopoulos, 2020)
- Measuring Misinformation in Video Search Platforms: An Audit Study on YouTube (Hussein and Juneja et al., 2020)
- When the Umpire is also a Player: Bias in Private Label Product Recommendations on E-commerce Marketplaces (Dash et al., 2021)
- Auditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation (Juneja and Mitra, 2021)
- Investigating Ad Transparency Mechanisms in Social Media: A Case Study of Facebook's Explanations (Andreou et al., 2018)
- Ad Delivery Algorithms: The Hidden Arbiters of Political Messaging (Ali et al., 2019)
- Bias misperceived: The role of partisanship and misinformation in YouTube comment moderation (Jiang, Robertson, and Wilson, 2019)
- “I Can’t Reply with That”: Characterizing Problematic Email Reply Suggestions (Robertson et al., 2021)
The algorithm inappropriately uses content from other sources and/or sensitive personal information from people. For example, an algorithm implicated in exploitation may infer sensitive personal information from users without proper consent, or feature content from an outside source without attribution.
- Automated Experiments on Ad Privacy Settings (Datta et al., 2015)
- Studying ad targeting with digital methods: The case of spotify (Mahler and Vonderau, 2017)
- Unveiling and Quantifying Facebook Exploitation of Sensitive Personal Data for Advertising Purposes (Cabanas, Cuevas, and Cuevas, 2018)
- The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies (McMahon, Johnson, and Hecht, 2017)
- Measuring the Importance of User-Generated Content to Search Engines (Vincent et al., 2019)
The algorithm makes incorrect predictions or classifications. Notably, misjudgment can often lead to discrimination, distortion, and/or exploitation, but some studies in the review focused on this initial error of misjudgment without exploring second-order problematic effects. An algorithm implicated in misjudgment may incorrectly classify a user’s employment status or mislabel a piece of political news as being primarily about sports, for example.
- Out With the Old and in With the New? An Empirical Comparison of Supervised Learning Algorithms to Predict Recidivism (Duwe and Kim, 2017)
- Better Practices in the Development and Validation of Recidivism Risk Assessments: The Minnesota Sex Offender Screening Tool–4 (Duwe, 2019)
- The right to confront your accusers: Opening the black box of forensic DNA software (Matthews et al., 2019)
- The Accuracy of the Demographic Inferences Shown on Google's Ad Settings (Tschantz et al., 2018)
- Auditing Offline Data Brokers via Facebook's Advertising Platform (Venkatadri, 2019)
- Quantity vs. Quality: Evaluating User Interest Profiles Using Ad Preference Managers (Bashir et al., 2019)
- Facebook Ads Monitor: An Independent Auditing System for Political Ads on Facebook (Silva et al., 2020)