Reassessed labels for the ILSVRC-2012 ("ImageNet") validation set

This repository contains data and example code for computing the "ReaL accuracy" on ImageNet used in our paper Are we done with ImageNet?.

Example code for computing ReaL accuracy

The following example code is licensed under the Apache 2.0 license, see LICENSE file. Disclaimer: This is not an officially supported Google product.

NumPy

import json
import numpy as np

real_labels = json.load('real.json')
predictions = np.argmax(get_model_logits(val_images), -1)

# Assuming val_images are ordered correctly (from ILSVRC2012_val_00000001.JPEG to ILSVRC2012_val_00050000.JPEG)
is_correct = [pred in real_labels[i] for i, pred in enumerate(predictions) if real_labels[i]]
real_accuracy = np.mean(is_correct)

# If the images were not sorted, then we need the filenames to map.
real_labels = {f'ILSVRC2012_val_{(i+i):08d}.JPEG': labels for i, labels in enumerate(json.load('real.json'))}
is_correct = [pred in real_labels[val_fnames[i]] for i, pred in enumerate(predictions) if real_labels[i]]
real_accuracy = np.mean(is_correct)

PyTorch

We hope to make our labels easier to use by integrating them with torchvision.datasets after the release.

Ross Wightman (@rwightman) has now included evaluation on the ReaL labels in his well known pytorch-image-models repository. Have a look at real_labels.py and the way it is used in validate.py for a good usage example in PyTorch.

TensorFlow Datasets

Our labels are available in the TensorFlow Datasets library.

Usage example:

import tensorflow_datasets as tfds

data = tfds.builder('imagenet2012_real')
data = data.as_dataset(split='validation')
data_iter = data.as_numpy_iterator()
example = data_iter.next()
example['image'], example['original_label'], example['real_label']

Description of the files

real.json

This file is a list of 50 000 lists which contain the "Reassessed Labels" used for evaluation in the paper.

The outer index of the list corresponds to the validation files, sorted by name. That means, the first list holds all valid labels for the file ILSVRC2012_val_00000001.JPEG, the second list holds all valid labels for the file ILSVRC2012_val_00000002.JPEG, etc.

Note that lists can be empty, in which case the file should not be included in the evaluation, nor in computing mean accuracy. These are images where the raters found none of the labels to reasonably fit.

scores.npz

This contains the scores for each rated (image, label) pair, as computed by the Dawid & Skene 1979 algorithm. These numbers were used to draw the full precision-recall curve in Figure 3, and could be used if you want to try different operating points than the one we used.

This is a compressed numpy archive, that can be loaded as follows:

data = np.load('scores.npz')
scores, info = data['tensor'], data['info']

Then, scores is an array of N floats in [0,1], and info is an array of N (fname, label) pairs describing which validation file and label is being scored.

raters.npz, golden.npz, raters_golden.npz

These are the raw rater votes, the "golden" labels provided by the 5 expert raters (paper authors) and the all rater's answers to those golden questions.

All files follow this format:

data = np.load('scores.npz')
scores, info, ids = data['tensor'], data['info'], data['ids']

Here, scores is a RxNx3 binary tensor, where a 1 is placed when rater R answered question N with no/maybe/yes, respectively. Again, info is a list of N (fname, label) pairs. Additionally, ids is a list of rater length R containing rater IDs, which can be used to match raters across raters.npz and raters_golden.npz.

List of ImageNet training files

We also release the list of training file-names used in Section 6 of the paper.

mxer / reassessed-imagenet