fgranese / DOCTOR

Advances in Neural Information Processing Systems (NeurIPS 2021)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DOCTOR

DOCTOR aims to identify whether the prediction of a classifier should or should not be trusted so that to choose between accepting or rejecting the prediction.

Table of the results

The results in the tables below reported in terms of AUROC% / FRR% (95% TRR).

1- Totally-Black-Box (TBB)

Dataset D_alpha D_beta SR MHLNB
CIFAR10 94 / 17.9 68.5 / 18.6 93.8 / 18.2 92.2 / 30.8
CIFAR100 87 / 40.6 84.2 / 40.6 86.9 / 40.5 82.6 / 66.7
TinyImageNet 84.9 / 45.8 84.9 / 45.8 84.9 / 45.8 78.4 / 82.3
SVHN 92.3 / 38.6 92.2 / 39.7 92.3 / 38.6 87.3 / 85.8
Amazon_Fashion 89.7 / 27.1 89.7 / 26.3 87.4 / 50.1 - / -
Amazon_Software 68.8 / 73.2 68.8 / 73.2 67.3 / 86.6 - / -
IMDb 84.4 / 54.2 84.4 / 54.4 83.7 / 61.7 - / -

2- Partially-Black-Box (PBB)

Dataset D_alpha D_beta ODIN MHLNB
CIFAR10 95.2 / 13.9 94.8 / 13.4 94.2 / 18.4 84.4 / 44.6
CIFAR100 88.2 / 35.7 87.4 / 36.7 87.1 / 40.7 50 / 94
TinyImageNet 86.1 / 43.3 85.3 / 45.1 84.9 / 45.3 59 / 86
SVHN 93 / 36.6 92.8 / 38.4 92.3 / 40.7 88 / 54.7

Current package structure

Package
├── data
├── datasets
├── lib_discriminators
│   ├── discriminators.py
├── models
│   └── sigmoid_nn.py
├── mystat
│   └── statistics.py
├── plots
├── tests
│   ├── compute_FRR_vs_TRR.py
│   └──  test_FRR_vs_TRR.py
├── utils
│   ├── GUI_tools.py
│   ├── dataset_utils.py
│   ├── files_utils.py
│   ├── var_utils.py
│   └── plot_utils.py
├── main.py
├── test_wrapper.py
├── README.md
└── requirements.txt

Parameter Setting

  • T_tbb temperature scaling in TBB (same for SR)
  • eps_tbb: perturbation magnitude in TBB (same for SR)
  • T_alpha: temperature scaling in PBB for D_alpha
  • eps_alpha: perturbation magnitude in PBB for D_alpha
  • T_beta: temperature scaling in PBB for D_beta
  • eps_beta: perturbation magnitude in PBB for D_beta
  • T_odin: temperature scaling in PBB for ODIN
  • eps_odin: perturbation magnitude in PBB for ODIN
  • T_mhlnb: temperature scaling in PBB for Mahalanobis
  • eps_mhlnb: perturbation magnitude in PBB for Mahalanobis
Name T_tbb eps_tbb T_alpha eps_alpha T_beta eps_beta T_odin eps_odin T_mhlnb eps_mhlnb
CIFAR10 1 0 1 0.00035 1.5 0.00035 1.3 0 1 0.0002
CIFAR100 1 0 1 0.00035 1.5 0.00035 1.3 0 1 0.0002
TinyImageNet 1 0 1 0.00035 1.5 0.00035 1.3 0 1 0.0002
SVHN 1 0 1 0.00035 1.5 0.00035 1.3 0 1 0.0002
Amazon_Fashion 1 0 1 0.00035 1.5 0.00035 1.3 0 1 0.0002
Amazon_Software 1 0 1 0.00035 1.5 0.00035 1.3 0 1 0.0002
IMDb 1 0 1 0.00035 1.5 0.00035 1.3 0 1 0.0002

Dataframe

DOCTOR requires the predictions for a given dataset to be in the following format. Example on CIFAR10:

  • 1,...,10: softmax probability associated to the corresponding class
  • label: predicted class
  • true_label: true class
1 2 3 4 5 6 7 8 9 10 label true_label
0.02 0.01 0.04 0.01 0.005 0.005 0.9 0.006 0.002 0.002 7 7

Dataframe are stored in the corresponding directory. For CIFAR10:

├── data
│   ├── cifar10_T_1_eps_0_test.csv
│   ├── cifar10_T_1_eps_0_train.csv
│   └── cifar10_T_1_eps_0_train_logits.csv
├── data_perturb
│   └── cifar10_T_1.3_eps_0_pt_odin_test.csv
├── data_perturb_our
│   ├── cifar10_T_1.5_eps_0.00035_pt_beta_test.csv
│   ├── cifar10_T_1_eps_0.0002_pt_mahalanobis_test_logits.csv
│   └── cifar10_T_1_eps_0.00035_pt_alpha_test.csv

Usage

A clean execution of DOCTOR is in:

tests/test_FRR_vs_TRR.py

To execute it:

  • Create the enviroment for DOCTOR:
foo@bar:~$ conda create --name doctor python=3.8
  • Activate the enviroment for DOCTOR:
foo@bar:~$ source activate doctor
  • Install all the required packages:
(doctor) foo@bar:~$ pip install -r requirements.txt
  • Launch the test from CLI for CIFAR10:
(doctor) foo@bar:~$ python main.py -d_name cifar10 -sc tbb 
(doctor) foo@bar:~$ python main.py -d_name cifar10 -sc pbb 

Output:

(doctor) foo@bar:~$ python main.py -d_name cifar10 -sc pbb -ood 
ALPHA: AUROC 95.2 % --- FRR (95% TRR) 13.9 %
BETA: AUROC 94.8 % --- FRR (95% TRR) 13.4 %
ODIN: AUROC 94.2 % --- FRR (95% TRR) 18.4 %
MAHALANOBIS: AUROC 84.4 % --- FRR (95% TRR) 44.6 %

Plot:

Experiments with OOD samples:

(doctor) foo@bar:~$ python main.py -d_name isun_cifar10 -sc pbb -ood True
ALPHA: AUROC 95.6 % / 0.1 % --- FRR 15.1 % / 0.1 %
BETA: AUROC 95.6 % / 0.0 % --- FRR 13.6 % / 0.5 %
ODIN: AUROC 95.4 % / 0.0 % --- FRR 16.1 % / 0.2 %
ODIN (DEFAULT SETTING OF ODIN) : AUROC 93.5 % / 0.0 % --- FRR 30.6 % / 0.4 %

Note that, the name of the dataset to set is out-dataset-name_in-dataset-name.csv.

Click here to download the datasets for OOD experiments.

Enviroment

We run each experiment on a machine equipped with an Intel(R) Xeon(R) CPU E5-2623 v4, 2.60GHz clock frequency, and a GeForce GTX 1080 Ti GPU.

We test this clean execution on a machine equipped with Intel(R) Core(TM) i7-8569U CPU @ 2.80GHz.

About

Advances in Neural Information Processing Systems (NeurIPS 2021)

License:MIT License


Languages

Language:Python 100.0%