Robustness Disparities in Face Detection

Abstract

Facial analysis systems have been deployed by large companies and critiqued by scholars and activists for the past decade. Many existing algorithmic audits examine the performance of these systems on later stage elements of facial analysis systems like facial recognition and age, emotion, or gender prediction; however, a core component to these systems has been vastly understudied from a fairness perspective: face detection. Since face detection is a pre-requisite step in facial analysis systems, the bias we observe in face detection will flow downstream to the other components like facial recognition and emotion prediction. Additionally, no prior work has focused on the robustness of these systems under various perturbations and corruptions, which leaves open the question of how various people are impacted by these phenomena. We present the first of its kind detailed benchmark of face detection systems, specifically examining the robustness to noise of commercial and academic models. We use both standard and recently released academic facial datasets to quantitatively analyze trends in face detection robustness. Across all the datasets and systems, we generally find that photos of individuals who are masculine presenting, older, of darker skin type, or have dim lighting are more susceptible to errors than their counterparts in other identities.

About the Benchmark

This benchmark uses four datasets to evaluate the robustness of face detection systems to natural types of noise.

Adience
Casual Conversations Dataset (CCD)
Open Images V6 -- Extended; More Inclusive Annotations for People (MIAP)
UTKFace

For a subset of the images in this dataset, we created 75 corrupted versions following the ImageNet-C pipeline.

Subsequently, each image (1 clean + 75 corrupted images) was passed through Amazon Web Services's Rekognition and Microsoft Azure face detection APIs.

We evaluated each image on each of the following six face detection models, three of which are produced my academic research groups, and three by commercial companies:

Academic Face Detection Models

MogFace
TinaFace
YOLO5Face

Commercial Face Detection Models

About this Repo

We conducted the image corruption and commercial models parts of this benchmark using AWS's S3 and EC2 infrastructure. The image datasets were downloaded to an S3 bucket, processed/corrupted using EC2 instances (primarily c5.large), and then passed through each API using EC2 instances (i3.xlarge) and storing responses in an S3 bucket. This process was specific to our choices, though any compute environment could be used to reproduce these results. To that end, we will include the essential code used to process the images and make the API calls, and do not include specific and superfluous data management scripts which would be idiosyncratic to the specific process we chose.

In the face_detection folder there are two sub directories: academic which has the code to process the academic models, and commercial which has the code to create the corrupted images and process each one with the academic APIs.

The docs folder contains the website's code.

Citation

@article{dooley2022robustness,
  title={Robustness Disparities in Face Detection},
  author={Dooley, Samuel and Wei, George Z. and Goldstein, Thomas and Dickerson, John P.},
  journal={Working Paper},
  year={2022}
}

Contact

If you'd like more information, get in contact with us! Happy to share more details or data or answer questions.

dooleys / robustness