🎉 Update (Aug 2021):

Would you like to test how well your own model performs on challenging generalisation tests, and whether it might even match or outperform human observers? This has never been easier! The comprehensive toolbox at bethgelab:model-vs-human supports all datasets reported here and comes with code to evaluate arbitrary PyTorch / TensorFlow models. Simply load your favourite models, hit run and get a full PDF report on generalisation behaviour including ready-to-use figures!

Data and materials from
"Comparing deep neural networks against humans:
object recognition when the signal gets weaker"

This repository contains information, data and materials from the paper "Comparing deep neural networks against humans: object recognition when the signal gets weaker" by Robert Geirhos, David H. J. Janssen, Heiko H. Schütt, Jonas Rauber, Matthias Bethge, and Felix A. Wichmann.

The article is available at https://arxiv.org/abs/1706.06969.

Please don't hesitate to contact me at robert.geirhos@bethgelab.org or open an issue in case there is any question!

This README is structured according to the repo's structure: one section per subdirectory (alphabetically).

category-mapping

Contains a .txt file with a mapping from all 16 employed entry-level MS COCO categories to the corresponding (fine-grained) ImageNet classes. Further information is provided in the file itself.

code

This subdirectory contains all image manipulation code used in our experiments (conversion to grayscale, adding noise, eidolon distortions, ..). The main method of image-manipulation.py walks you through the various degradations. Note that the eidolon manipulation that we use in one of our experiments is based on the Eidolon github repository, which you will need to download / clone if you would like to use it. We found and fixed a bug in the Python version of the toolbox, for which we created a pull request in August 2016 (Fixed bug in partial coherence #1) which has not (yet?) been merged (as of June 2017). Make sure to collect the files from the pull request as well, otherwise you will get different images!

data-analysis

The data-analysis/ subdirectory contains a main R script, data-analysis.R, which can be used to plot and analyze the data contained in raw-data/. We used R version 3.2.3 for the data analysis.

images

We preprocessed images from the ILSVRC2012 training database as described in the paper (e.g. we excluded grayscale images). In total we retained 213,555 images. The images/ directory contains a .txt file with the final image names (the ones that were retained). If you would like to obtain the images, check out the ImageNet website. In every experiment, the number of presented images for every entry-level MS COCO category (e.g. dog, car, boat, ...) were exactly the same.

lab-experiment

experimental-code

Contains the main MATLAB experiment, object_recognition_experiment.m, as well as a .yaml file for every experiment. In the .yaml file, the specific parameter values used in an experiment are specified (such as the stimulus presentation duration). Some functions depend on our in-house iShow library which can be obtained from here.

helper-functions

Some of the helper functions are based on other people's code, please check out the corresponding files for the copyright notices.

response-screen-icons

The response screen icons appeared on the response screen, and participants were instructed to click on the corresponding one. The icons were taken from the MS COCO website.

raw-accuracies

The raw-accuracies/ directory contains a .txt file for each experiment with a table of all accuracies (split by experimental condition and subject/network). This therefore contains the underlying data used for all accuracy plots in the paper, and may be useful, for example, if one would like to generate new plots for comparing other networks to our human observers' accuracies. Note that all accuracies reported in these files are percentages.

raw-data

This directory contains the raw data for all experiments reported in the paper, including a total number of 39,680 human trials in a controlled lab setting. Every .csv raw data file has a header with the bold categories below, here's what they stand for:

subj: for DNNs (Deep Neural Networks), name of network; for human observers: number of subject. This number is consistent across experiments. Note that the subjects were not necessarily given consecutive numbers, therefore it can be the case that e.g. 'subject-04' does not exist in some or all experiments.
session: session number
trial: trial number
rt: reaction time in seconds, or 'NaN' for DNNs
object_response: the response given, or 'na' (no answer) if human subjects failed to respond
category: the presented category
condition: short indicator of the condition of the presented stimulus. Color-experiment: 'cr' for color, 'bw' for grayscale images; contrast-experiment: 'c100', 'c50', ... 'c01' for 100%, 50%, ... 1% nominal contrast; noise-experiment: '0', '0.03', ... '0.9' for noise width; eidolon-experiment: in the form 'a-b-c', indicating:
- a is the parameter value for 'reach', in {1,2,4,8,...128}
- b in {0,3,10} for coherence value of 0.0, 0.3, or 1.0
- c = 10 for grain value of 10.0 (not varied in this experiment)
imagename:

e.g. 3841_eid_dnn_1-0-10_knife_10_n03041632_32377.JPEG

This is a concatenation of the following information (separated by '_'):

a four-digit number starting with 0000 for the first image in an experiment; the last image therefore has the number n-1 if n is the number of images in a certain experiment
short code for experiment name, e.g. 'eid' for eidolon-experiment
either e.g. 's01' for 'subject-01', or 'dnn' for DNNs
condition
category (ground truth)
a number (just ignore it)
image identifier in the form a_b.JPEG, with a being the WNID (WordNet ID) of the corresponding synset and b being an integer.

rgeirhos / object-recognition

🎉 Update (Aug 2021):

Data and materials from
"Comparing deep neural networks against humans:
object recognition when the signal gets weaker"

category-mapping

code

data-analysis

images

lab-experiment

experimental-code

helper-functions

response-screen-icons

raw-accuracies

raw-data

About

Languages

🎉 Update (Aug 2021):

Data and materials from "Comparing deep neural networks against humans: object recognition when the signal gets weaker"

category-mapping

code

data-analysis

images

lab-experiment

experimental-code

helper-functions

response-screen-icons

raw-accuracies

raw-data

About

Languages

Data and materials from
"Comparing deep neural networks against humans:
object recognition when the signal gets weaker"