marketdesignresearch / NOMU

NOMU: Neural Optimization-based Model Uncertainty

Home Page:https://arxiv.org/abs/2102.13640

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NOMU:Neural Optimization-based Model Uncertainty

Published at ICML 2022

This is a piece of software used for performing experiments on NOMU uncertainty bounds and the three popular benchmarks (i) Monte-Carlo Dropout [1], (ii) Deep Ensembles [2], (iii) Hyper Deep Ensembles [3] and (iv) Gaussian Process. The experiments are described in detail in the corresponding paper:

NOMU:Neural Optimization-based Model Uncertainty
Jakob Weissteiner, Jakob Heiss, Hanna Wutte, Sven Seuken, and Josef Teichmann.
In Proceedings of the 39th International Conference on Machine Learning (ICML’22), Baltimore, USA, July 2022.
Full paper version including appendix: [pdf]

A. Requirements

  • Python>=3.7

B. Dependencies

Prepare your python environment (conda, virtualenv, etc.) and enter this after activating your environment. For the regression experiments use the requirments.txt from the folder regression. For the Bayesian optimization experiments use the requirments.txt from the folder bayesian_optimization.

Using pip:

$ pip install -r requirements.txt

NOTE: On some operating systems issues with tensorflow 2.3.0 can occur caused by the GPU. These issue can be resolved by not using the GPU and calculate everything using the CPU. For this copy the code below to the top of the respective script:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

C. Regression Experiments

How to run

First navigate to the folder regression.

C.0 To run a regression experiment on with your own data set

  1. Open the file run_NOMU_own_your_own_data.py

  2. In the following example, we provide example data x_data.json and y_data.json. To use your own data set replace those in the code below and load your own data instead.

# %% DATA PREPARATION

# provided example data (stems from a GaussianBNN)
#############
x = np.asarray(json.load(open('x_data.json')))
y = np.asarray(json.load(open('y_data.json')))
n_train = x.shape[0]
input_dim = x.shape[1]
#############
  1. Next, input x and output y are scaled to $[-1, 1]^{input\_dim} \text{ and } [-1, 1]$, respectively.
# 1. scale training data: x to [-1,1]^input_dim, y to [-1,1]^1 (recommended)
normalize_data = True  # recommended to set to True for better learning

if normalize_data:
    x_maxs = np.max(x, axis=0)
    x_mins = np.min(x, axis=0)
    y_max = np.max(y)
    y_min = np.min(y)

    for i, x_min in enumerate(x_mins):
        x[:, i] = 2 * ((x[:, i] - x_min) / (x_maxs[i] - x_min)) - 1

    y = 2 * ((y - y_min) / (y_max - y_min)) - 1

    print(f"\nScaled X-Training Data of shape {x.shape}")
    print(x)
    print(f"\nScaled y-Training Data of shape {y.shape}")
    print(y)
  1. Next, select n_art, the number of artificial input data points for the NOMU loss (term c). Currently, those are sampled uniformly from $[-1.1,1.1]$ (if you remove/change the scaling from point 1. above, you also need to change the range from where you sample uniformly at random the artificial input data points, i.e., x_min_art and x_max_art. NOTE: these changes have to be progated to the nomu class (fit method, set_augmentation_bounds method, etc.))
# 2. generate NOMU input: add artificial (also called augmented) input data points for NOMU-loss term (c); in this example sampled uniformly at random
#############
n_art = 200  # number of artificial (augmented) input data points.
#############

aug_in_training_range = False  # sample artificial training data only in training data range? If False, they are sampled from the normalized range.
aug_range_epsilon = 0.05

# find range to sample augmented data from
if aug_in_training_range:
    x_min_art = np.min(x, axis=0)
    x_max_art = np.max(x, axis=0)
else:
    x_min_art = -1
    x_max_art = 1
margin = (x_max_art - x_min_art) * aug_range_epsilon
x_min_art -= margin
x_max_art += margin

x_art = np.random.uniform(low=x_min_art, high=x_max_art, size=(n_art, x.shape[1])) # if you activate MCaug=True, these values do not matter because they will be overwritten internally by NOMU
y_art = np.ones((n_art, 1)) # these values do not matter, only the dimension matters
x = np.concatenate((x, np.zeros((n_train, 1))), axis=-1) # add 0-flag identifying a real training point
x_art = np.concatenate((x_art, np.ones((x_art.shape[0], 1))), axis=-1) # add 1-flag identifying a artificial training point

x_nomu = np.concatenate((x, x_art))
y_nomu = np.concatenate((np.reshape(y, (n_train, 1)), y_art))

print(f'\nX NOMU Input Data of shape {x_nomu.shape} (real training points:{n_train}/artificial training points:{n_art})')
print(x_nomu)
print(f'\ny NOMU Input Data of shape {y_nomu.shape} (real training points:{n_train}/artificial training points:{n_art})')
print(y_nomu)
  1. Next, select NOMU's HPs
# %% NOMU HPs
layers = (input_dim, 2 ** 10, 2 ** 10, 2 ** 10, 1)  # layers incl. input and output
epochs = 2 ** 10
batch_size = 32
l2reg = 1e-8  # L2-regularization on weights of \hat{f} network
l2reg_sig = l2reg  # L2-regularization on weights of \hat{r}_f network
seed_init = 1  # seed for weight initialization

# (b) optimizer
# ----------------------------------------------------------------------------------------------------------------------------
optimizer = "Adam"  # select optimizer stochastic gradient descent: 'SGD' or adaptive moment estimation: 'Adam'

# (c) loss parameters
# ----------------------------------------------------------------------------------------------------------------------------
MCaug = True  # Monte Carlo approximation of the integrals in the NOMU loss with uniform sampling
mu_sqr = 0.1  # weight of squared-loss (\pi_sqr from paper)
mu_exp = 0.01  # weight exponential-loss (\pi_exp from paper)
c_exp = 30  # constant in exponential-loss
side_layers = (input_dim, 2 ** 10, 2 ** 10, 2 ** 10, 1)  # r-architecture
r_transform = "custom_min_max"  # either 'id', 'relu_cut' or 'custom_min_max' (latter two use r_min and r_max).
r_min = 1e-3  # minimum model uncertainty for numerical stability
r_max = 2  # asymptotically maximum model uncertainty
  1. Next, run NOMU. This creates a folder NOMU_real_data__ where the a training history plot is saved.
# %% RUN NOMU
start0 = datetime.now()
foldername = "_".join(["NOMU", 'real_data', start0.strftime("%d_%m_%Y_%H-%M-%S")])
savepath = os.path.join(os.getcwd(), foldername)
os.mkdir(savepath)  # if folder exists automatically an FileExistsError is thrown
verbose = 0
#
nomu = NOMU()
nomu.set_parameters(
    layers=layers,
    epochs=epochs,
    batch_size=batch_size,
    l2reg=l2reg,
    optimizer_name=optimizer,
    seed_init=seed_init,
    MCaug=MCaug,
    n_train=n_train,
    n_aug=n_art,
    mu_sqr=mu_sqr,
    mu_exp=mu_exp,
    c_exp=c_exp,
    r_transform=r_transform,
    r_min=r_min,
    r_max=r_max,
    l2reg_sig=l2reg_sig,
    side_layers=side_layers,
    normalize_data=normalize_data,
    aug_in_training_range=aug_in_training_range,
    aug_range_epsilon=aug_range_epsilon,
    )

nomu.initialize_models(verbose=verbose)
nomu.compile_models(verbose=verbose)
nomu.fit_models(x=x_nomu,
                y=y_nomu,
                x_min_aug = x_min_art,
                x_max_aug = x_max_art,
                verbose=verbose)
nomu.plot_histories(yscale="log",
                    save_only=True,
                    absolutepath=os.path.join(savepath, "Plot_History_seed_"+ start0.strftime("%d_%m_%Y_%H-%M-%S")))

end0 = datetime.now()
print("\nTotal Time Elapsed: {}d {}h:{}m:{}s".format(*timediff_d_h_m_s(end0 - start0)),
      "(" + datetime.now().strftime("%H:%M %d-%m-%Y") + ")",
)

Finally, use the fitted NOMU model's model uncertainty and mean output as follows:

# %% HOW TO USE NOMU OUTPUTS
new_x = np.array([[0,0],[0.5,0.5]]) # 2 new input points
predictions = nomu.calculate_mean_std(new_x) # predict mean and model uncertainty

mean, sigma_f = predictions['NOMU_1'] # extract them

if normalize_data:
    print(f"\nScaled-[-1,1]-Predictions mean:{mean} | sigma_f:{sigma_f}")

    mean_orig, sigma_f_orig = (y_max - y_min) * (mean + 1) / 2 + y_min, (
        y_max - y_min
    ) * (
        sigma_f + 1
    ) / 2 + y_min  # rescale them to original scale
    print(f"\nPredictions mean:{mean_orig} | sigma_f:{sigma_f_orig}")
else:
    print(f"\nPredictions mean:{mean} | sigma_f:{sigma_f}")

C.1 Toy Regression (Section 4.1.1)

To run a toy regression experiment on any of the provided test functions over multiple seeds

  1. Set the desired test function, seeds and parameters in the file simulation_toy_regression_section_4.1.1.py.
  2. Run:
    $ python simulation_toy_regression_section_4.1.1.py

Additionally to the console printout a folder Multiple_Seeds_<function_name>_<date>_<time> will be created in the regression folder, where the UBs-plots, history-plots and metric-plots can be found.

NOTE: Parameters are set in simulation_toy_regression_section_4.1.1.py such that one run for the Levy1D function is performed and one reconstructs Figure 4 from the paper (runtime ~5 mins)

NOTE: To enable a head-to-head comparison of Table 1 and Tables 5-8:

  1. We use for the 500 runs in 1D and 2D regression the seeds 501--1000 (unparallelized runtime on single local machine for one test function ~5*500min=42h). Concretely, we then set in simulation_synthetic_functions.py the parameters
    • number_of_instances = 500
    • my_start_seed = 501
  2. We conduct these experiments on
    • system: Linux
    • version: SMP Debian 4.19.160-2 (2020-11-28)
    • platform: Linux-4.19.0-13-amd64-x86_64-with-debian-10.8
    • machines: Intel Xeon E5-2650 v4 2.20GHz processors with 48 logical cores and 128GB RAM and Intel E5 v2 2.80GHz processors with 40 logical cores and 128GB RAM
    • python: Python 3.7.3 [GCC 8.3.0] on linux
  3. These experiments are conducted with Tensorflow using the CPU only and no GPU

C.2 Generative Test-Bed (Section 4.1.2)

To run a generative test-bed experiment over multiple seeds

  1. Set the desired dimension, seeds and parameters in the file simulation_generative_testbed_section_4.1.2.py.
  2. Run:
    $ python simulation_generative_testbed_section_4.1.2.py

Additionally to the console printout a folder Multiple_Seeds_GaussianBNN_<date>_<time> will be created in the regression folder, where the UBs-plots, history-plots and metric-plots can be found.

NOTE: Parameters are set in simulation_generative_testbed_section_4.1.2.py such that one run for a 1D GaussianBNN test function is performed (runtime ~5 mins)

NOTE: To enable a head-to-head comparison of Table 2:

  1. We use for the 200 runs the seeds 501--700 (unparallelized runtime on single local machine for one test function ~5*500min=17h). Concretely, we then set in simulation_generative_testbed_section_4.1.2.py the parameters
    • number_of_instances = 200
    • my_start_seed = 501
  2. We conduct these experiments on
    • system: Linux
    • version: SMP Debian 4.19.160-2 (2020-11-28)
    • platform: Linux-4.19.0-13-amd64-x86_64-with-debian-10.8
    • machines: Intel Xeon E5-2650 v4 2.20GHz processors with 48 logical cores and 128GB RAM and Intel E5 v2 2.80GHz processors with 40 logical cores and 128GB RAM
    • python: Python 3.7.3 [GCC 8.3.0] on linux
  3. These experiments are conducted with Tensorflow using the CPU only and no GPU

C.3 Solar Irradiance Time Series (Section 4.1.3)

To run the solar irradiance data interpolation [4]

  1. Set the desired parameters in the file simulation_solar_irradiance_section_4.1.3.py.
  2. Run:
    $ python simulation_solar_irradiance_section_4.1.3.py

When finished a folder called Irradiance_<date>_<time> will be created in the regression folder, where the UBs-plots, history-plots and metric-plots can be found.

NOTE: To enable a head-to-head comparison of Figure 5 and Figure 13:

  1. We set for this experiment the seed equal to 655 (runtime ~2h). Concretely, we set in simulation_solar_irradiance_section_4.1.3.py the parameter
    • SEED=655
  2. These experiments were conducted on
    • system: Linux
    • version: Fedora release 32 (Thirty Two)
    • platform: Linux-5.8.12-200.fc32.x86_64-x86_64-with-glibc2.2.5
    • machines: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz processors with 4 cores and 15GB RAM
    • python: Python 3.8.7 [GCC 10.2.1 20201125 (Red Hat 10.2.1-9)] on linux
  3. These experiments are conducted with Tensorflow using the CPU only and no GPU

C.4 UCI data sets (Section 4.1.4)

To run the regression on a UCI or UCI gap data set

  1. Set the desired parameters in the file simulation_uci_section_4.1.4.py.
  2. Run:
    $ python simulation_uci_section_4.1.4.py [experiment type] [gap dimension] [seed]

where experiment type is one of 'UCI' or 'UCI-Gap', gap dimension is the dimension in the input training data for which a gap should be introduced (ignored when experiment type=='UCI') and a basic seed for the experiment (train/val/test split and initializations build upon this seed).

When finished a folder called [experiment type]_[data set name]_<date>_<time> will be created in the regression folder, where models are saved.

NOTE: Parameters are set in simulation_uci_section_4.1.4.py such that the dataset Boston is used

NOTE: For creating Table 3 (UCI) we used the seeds 1--20.

NOTE: For creating Table 13 (UCI Gap) we used the seeds 0--din, where din denotes the input dimension of the corresponding UCI-Gap dataset, i.e. we conduct din-many runs where we create for each dimension a gap.

D. Bayesian Optimization Experiments

How to run

To run the bayesian optimization experiment create or adjust the configuration file which defines all parameters used for the experiment run. Then save the configuration file as a '.ini' file. Once the configuration file is prepared run the script "simulation_BO.py" located in the bayesian_optimization folder with the path to the configuration file as first argument. For example like this:

$ python simulation_BO.py ./example_config.ini

NOTE: Parameters are set in /example_config.ini such that one run for the Levy5D function is performed for a MW scaling of 0.5 and 64 BO steps. This reconstructs one run of Figure 17 d in the Appendix ((unparallelized runtime on single local machine without GPU tensorflow ~12h))

When using the DIRECT optimizer for the acquisition function optimization depending on the operating system some additional steps are required to be able to run the scipydirect implementation of the direct algorithm.

First install the fortran compilers to you conda environment using (not required of Linux systems):

$ conda install -c msys2 m2w64-gcc

If the code still does not work, copy the file "bayesian_optimization/libs/direct.cp37-win_amd64.pyd"(Windows) or "bayesian_optimization/libs/direct.cpython-37m-x86_64-linux-gnu.so"(Linux) into the scipydirect folder in your environment lib. Usually the folder is located under 'Anaconda/envs//Lib/site-packages/scipydirect' (Windows with Anaconda). Make sure the requirements from the requirements.txt are already installed.

NOTE: Running a Bayesian optimization experiment for multiple steps even for one single seed and a single test function can take multiple hours (GPU supported tensorflow is thus recommended).

Configuration

The Bayesian optimization experiment can be configured using a config file. With this config file the different algorithms and all subprocesses can be configured. The config file is from filetype (.ini) and looks like the following example:

[General]
seeds = 1

[BO]
function = levy5D
output_path = 
steps = 64
n_train = 8
lower_bounds = -1.0, -1.0, -1.0, -1.0, -1.0
upper_bounds = 1.0, 1.0, 1.0, 1.0, 1.0


[Optimizer]
optimizer = adam
learning_rate = 0.001
beta_1 = 0.9
beta_2 = 0.999
epsilon = 1e-07
amsgrad = false

[Acquisition]
function = upper_bound
factor = 1.0

    [[mean_width_scaling_mc]]
    order=1
    scale_mean_width=0.5
    lower_bound=-1., -1., -1., -1., -1.
    upper_bound=1., 1., 1., 1., 1.
    n_test_points=20000
    once_only=true

[Acquisition Optimizer]
optimizer = direct
lower_search_bounds = -1.0, -1.0, -1.0, -1.0, -1.0
upper_search_bounds = 1.0, 1.0, 1.0, 1.0, 1.0

	[[dynamic_c_exp]]
	x_range = 2.0
	n_start_points = 8
	range_fraction = 0.25 
	end_eps = 0.01
	start_eps =
	max_increase_iter = 15
	n_steps = 64

[GP]
alpha=1e-7
optimizer=fmin_l_bfgs_b
n_restarts_optimizer=10
normalize_y=true
copy_X_train=true
random_state=
std_min=1e-6
kernel_once=false

    [[Kernel]]
    kernel=rbf
    constant_value=1
    constant_value_bounds=1e-5,1e5
    length_scale=1
    length_scale_bounds=1e-5,1e5

[NOMU]
epochs = 1024
r_max = 2.0
r_min = 1e-6
mip = false
main_layers = 5, 1024, 1024, 1024, 1
side_layers = 5, 1024, 1024, 1024, 1
lowerbound_x_aug = -1.0, -1.0, -1.0, -1.0, -1.0
upperbound_x_aug = 1.0, 1.0, 1.0, 1.0, 1.0
n_aug = 500
c_aug = 500
mu_sqr = 1.0
mu_abs = 0.0
mu_exp = 0.01
c_exp = 30
c_2 =
seed = 3
l2reg = 1e-8
activation = relu
RSN = false

[DO]
epochs = 1024
n_samples = 10
layers = 5, 1024, 2048, 1024, 1
activation = relu
RSN = false
dropout = 0.2
seed = 3
loss = mse
l2reg = 1e-8
normalize_regularization=true

[DE]
epochs = 1024
n_ensembles = 5
random_seed = true
layers = 5, 256, 1024, 512, 1
activation = relu
RSN = false
l2reg = 1e-8
softplus_min_var = 1e-6
s = 0.05
seed = 3
loss =
no_noise = true
normalize_regularization=true

[HDE]
epochs = 1024
K = 5
kappa = 5
test_size = 0.2
random_seed = false
global_seed = 1
layers = 5, 256, 1024, 512, 1
activation = relu
RSN = false
l2reg = 1e-8
softplus_min_var = 1e-6
s = 0.05
seed = 3
loss =
no_noise = true
normalize_regularization=true
dropout_probability_range=0.001, 0.9
l2reg_range=0.001,1000
fixed_row_init=true

Each section defines a certain part of the whole algorithm (indicated in the config file with "[]"). In the following, we highlight for each section which parameters and submodules can be configured and give explanations of all parameters (relevant for the paper).

General (required)

This section defines the general setup of the experiment. In particular, it defines the number of runs and the seeds for these runs.

parameter Explanation Example Can be empty
seeds list of integers: defines the seeds for the different runs which set the random starting samples; the number of intergers in the list defines how many runs will be conducted 1,2,3,4,5 No

BO (required)

This section defines the setup of the Bayesian optimization. Which function to test, how many steps to take and how many initial samples to use.

parameter Explanation Example Can be empty
function string defining which function to use (full list of possible strings below) levy5D No
output_path path where to store the output files ./some_path No
steps integer defining ho many steps the Bayesian optimization should take 64 No
n_train integer defining how many starting samples should be sampled 8 No
lower_bounds list of integers which define the lower bounds of the input space -1.0, -1.0, -1.0, -1.0, -1.0 No
upper_bounds list of integers which define the upper bounds of the input space 1.0, 1.0, 1.0, 1.0, 1.0 No

Possible functions to use:

  • forrester
  • levy
  • sinone
  • branin2D
  • camelback2D
  • goldstein_price2D
  • levy5D
  • levy10D
  • levy20D
  • rosenbrock2D
  • rosenbrock5D
  • rosenbrock10D
  • rosenbrock20D
  • perm2D
  • perm5D
  • perm10D
  • perm20D
  • g_function2D
  • g_function5D
  • g_function10D
  • g_function20D
  • schwefel3D
  • hartmann3D
  • hartmann6D
  • michalewicz2D
  • michalewicz5D
  • michalewicz10D
  • michalewicz20D

Acquisition (required)

This section defines the acquisition function that should be used for the Bayesian optimization

parameter Explanation Example Can be empty
function string defining which acquisition function to use (full list off possible options below) upper_bound No
factor factor to multiply the uncertainty width for Upper Bounds 1 required for upper bound
xi "trade of" parameter for Probability of Improvement (PoI) and Expected Improvement (EI) 0.1 required for PoI and EI

Possible acquisition functions to use:

  • mean_only (resolve to the mean prediction value only)
  • uncertainty_only (resolve to the uncertainty prediction value only)
  • upper_bound (upper bound (mean + uncertainty))
  • probability_of_improvement (probability of improvement (PoI))
  • expected_improvement (expected improvement (EI))
Extensions

For the acquisition functions there are different extensions that can be made. These extensions are wrapped around the acquisition function and modify it. These extension can be configured using subsection indicated with [[]]

mean_width_scaling

Estimates the mean width (MW) of the predicted uncertainty bounds based on a grid.

parameter Explanation Example Can be empty
scale_mean_width float defining the MW budget which defines the calibration parameter c that is used to scale the uncertainty. 0.05 No
lower_bound list of integers which defines the lower bounds of the input space for the test points -1.0, -1.0, -1.0, -1.0, -1.0 No
upper_bound list of integers which defines the upper bounds of the input space for the test points 1.0, 1.0, 1.0, 1.0, 1.0 No
n_test_points number of test points per input dimension (total number of grid points = n_test_points^d) 200 No
once_only boolean defining that the calibration parameter c should only be calculated in the first step true No
order order in which the extension should be applied for the case that there are multiple extensions active 1 No
mean_width_scaling_mc

Estimates the mean width (MW) of the predicted uncertainty bounds based on Monte Carlo sampling.

parameter Explanation Example Can be empty
scale_mean_width float defining the MW budget which defines the calibration parameter c that is used to scale the uncertainty. 0.05 No
lower_bound list of integers which defines the lower bounds of the input space for test points -1.0, -1.0, -1.0, -1.0, -1.0 No
upper_bound list of integers which defines the upper bounds of the input space for test points 1.0, 1.0, 1.0, 1.0, 1.0 No
n_test_points number of samples for the Monte Carlo sampling 20000 No
once_only boolean defining that the calibration parameter c should only be calculated in the first step true No
order order in which the extension should be applied for the case that there are multiple extensions active 1 No
bounded_r

Applies a bounding function to the uncertainty. The NOMU method uses this bounding by default and thus it does not need to be configured here.

parameter Explanation Example Can be empty
r_max upper bound for the uncertainty 2.0 No
r_min lower bound for the uncertainty 1e-6 No
order order in which the extension should be applied for the case that there are multiple extensions active 1 No

Acquisition Optimizer (required)

parameter Explanation Example Can be empty
optimizer string defining which acquisition function optimizer should be used (full list below) direct No
lower_search_bounds lower bound for the uncertainty 0.01 No
upper_search_bounds upper bound for the uncertainty 2 No
order order in which the extension should be applied for the case that there are multiple extensions active 1 No

Possible acquisition functions optimizer:

  • grid_search
  • direct (DIRECT)
  • mip (Mixed Integer Programming (only for NOMU))
Extensions

For the acquisition function optimizer there are different extensions that can be applied. These extensions are wrapped around the acquisition function optimizer and modify it. These extension can be configured using subsection indicated with [[]] in the config file.

dynamic_c_exp

Specifies the parameters of the dynamic c procedure with an exponential decay (see Appendix B.3.2.; Note the delta from the appendix is the epsilon here).

Note: if start_eps is not specified it is calculated as follows: start_eps = (x_range * range_fraction) / n_start_points

parameter Explanation Example Can be empty
x_range input range 2 No
n_start_points number of starting samples of the run 8 No
range_fraction fraction of the input space 0.25 No
end_eps the last value for the epsilon 0.01 No
start_eps the starting value for the epsilon Yes
max_increase_iter number of times the dynamic c can double the factor c 15 No
n_steps number of steps the epsilon decays (e.g.: for 64 total steps and "n_step"=60, the last 4 steps will be with smallest epsilon) 60 No
dynamic_c

Specifies the parameters of the dynamic c procedure with a linear decay.

Note: if start_eps is not specified it is calculated as follows: start_eps = (x_range * range_fraction) / n_start_points

parameter Explanation Example Can be empty
x_range input range 2 No
n_start_points number of starting samples of the run 8 No
range_fraction fraction of the input space 0.25 No
end_eps the last value for the epsilon 0.01 No
start_eps the starting value for the epsilon Yes
max_increase_iter number of times the dynamic c procedure can double the factor c 15 No
n_steps number of steps the epsilon decays (e.g.: for 64 total steps and "n_step"=60, the last 4 steps will be with smallest epsilon) 60 No

Optimizer (required)

parameter Explanation Example Can be empty
optimizer string defining which NN optimizer to use (currently only adam supported) adam No
learning_rate learning rate of the optimizer 0.001 No
beta_1 beta_1 of the optimizer 0.9 No
beta_2 beta_2 of the optimizer 0.999 No
epsilon epsilon of the optimizer 1e-07 No
amsgrad amsgrad of the optimizer false No

NOMU

This section defines the parameters for NOMU algorithm

parameter Explanation Example Can be empty
epochs number of epochs to train the NOMU architecture 1024 No
r_max upper bound for the readout map (\ell_max) 2.0 No
r_min lower bound for the readout map (\ell_min) 1e-6 No
mip boolean whether to use the mip compatible readout map false No
main_layers numbers of nodes per layer for the main-network, as list 5, 1024, 1024, 1024, 1 No
side_layers numbers of nodes per layer for the side-network, as list 5, 1024, 1024, 1024, 1 No
lowerbound_x_aug lower bounds of input space for the artificial input points -1.0, -1.0, -1.0, -1.0, -1.0 No
upperbound_x_aug upper bounds of input space for the artificial input points 1.0, 1.0, 1.0, 1.0, 1.0 No
n_aug number of artificial input points 500 No
mu_sqr factor for the squared term of the loss (pi_sqr) 1.0 No
mu_exp factor for the exponential term of the loss (pi_exp) 0.01 No
c_exp c for the exponential term of the loss 30 No
seed seed for layer initialization 3 No
l2reg L2-regularization parameter 1e-8 No
activation string indicating which activation functions to use relu No

DE

This section defines the parameters for the deep ensembles algorithm

parameter Explanation Example Can be empty
epochs number of epochs to train the networks 1024 No
n_ensembles size of the ensemble 5 No
layers numbers of nodes per layer for the network, as list 5, 256, 1024, 512, 1 No
normalize_regularization should the L2-regularization be adjusted according to the data noise assumption true No
no_noise should the method be adjusted for the noiseless case: If true loss=mse and each network has only one output (mean prediction), if false loss=nll and each network has two outputs (mean and data noise prediction) true No
loss overwrite default losses defined be no_noise parameter with custom loss Yes
softplus_min_var softplus minimum variance (used for nll if no_noise is false) 1e-6 Yes
seed seed for layer initialization 3 No
l2reg L2-regularization parameter 1e-8 No
activation string indicating which activation functions to use relu No

DO

This section defines the parameters for the Monte Carlo (MC) dropout algorithm

parameter Explanation Example Can be empty
epochs number of epochs to train the network 1024 No
n_samples number of stochastic forward passes 10 No
layers numbers of nodes per layer for the network, as list 5, 256, 1024, 512, 1 No
normalize_regularization should regularization be adjusted according to the data noise assumption true No
dropout dropout probability 0.2 No
loss overwrite with custom loss mse No
seed seed for layer initialization 3 No
l2reg L2-regularization factor 1e-8 No
activation string indicating which activation functions to use relu No

GP

This section defines the parameters for the Gaussian Process (GP) algorithm

parameter Explanation Example Can be empty
alpha inhereted parameter of the GaussianProcessRegressor from sklearn 1e-7 No
optimizer inhereted parameter of the GaussianProcessRegressor from sklearn fmin_l_bfgs_b No
n_restarts_optimizer inhereted parameter of the GaussianProcessRegressor from sklearn 10 No
normalize_y inhereted parameter of the GaussianProcessRegressor from sklearn true No
copy_X_train inhereted parameter of the GaussianProcessRegressor from sklearn true No
random_state inhereted parameter of the GaussianProcessRegressor from sklearn Yes
std_min inhereted parameter of the GaussianProcessRegressor from sklearn 1e-6 No
kernel_once can the kernel optimizer optimize the parameters only during the first BO step false No

HDE

This section defines the parameters for the Gaussian Process (GP) algorithm

parameter Explanation Example Can be empty
epochs number of epochs to train the network 1024 No
global_seed seed producing random initialisation of models 1 No
random_seed boolean to use a random global seed false No
test_size fraction of sample points to use as test-set 0.2 No
kappa number of different hyperparameter configuration per model initialization 20 No
K size of the final ensemble 5 No
layers numbers of nodes per layer for the network, as list 5, 256, 1024, 512, 1 No
normalize_regularization should regularization be adjusted according to the data noise assumption true No
loss overwrite with custom loss mse Yes
no_noise should the method be adjusted for the noiseless case: If true loss=mse and each network has only one output (mean prediction), if false loss=nll and each network has two outputs (mean and data noise prediction) true No
seed seed for layer initialization 3 No
l2reg L2-regularization factor 1e-8 No
activation string indicating which activation functions to use relu No
Kernel

The kernel for the Gaussian Process can be configured individually

parameter Explanation Example Can be empty
kernel string defining which kernel to use (currently only option is rbf) rbf No
constant_value starting value for the constant value 1 No
constant_value_bounds bounds inside which the kernel optimizer can optimize the constant value parameter 1e-5,1e5 No
length_scale starting value for the length scale 1 No
length_scale_bounds bounds inside which the kernel optimizer can optimize the length scale parameter 1e-5,1e5 No

NOTE: To enable a head-to-head comparison of Table 3 and Figure 6:

  1. We use seeds 1,...,100 for the 5D experiments and 1,...,50 for the 10D and 20D experiments (unparallized runtime on a single local machine for one seed and one test functions 12h~(5D) 18h~(10D) 24h~(20D)).
  2. We conduct these experiments on
    • system: Linux
    • version: SMP Debian 4.19.160-2 (2020-11-28)
    • platform: Linux-4.19.0-13-amd64-x86_64-with-debian-10.8
    • machines: Intel Xeon E5-2650 v4 2.20GHz processors with 48 logical cores and 128GB RAM and Intel E5 v2 2.80GHz processors with 40 logical cores and 128GB RAM
    • python: Python 3.7.3 [GCC 8.3.0] on linux
  3. These experiments are conducted with Tensorflow using the CPU only and no GPU

E. References

[1] Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning http://proceedings.mlr.press/v48/gal16.html

[2] Simple and Scalable Predictive UncertaintyEstimation using Deep Ensembles http://papers.nips.cc/paper/7219-simple-and-scalable-predictive-uncertainty-estimation-using-deep-ensembles.pdf

[3] Hyperparameter ensembles for robustness and uncertainty quantification https://proceedings.neurips.cc/paper/2020/file/481fbfa59da2581098e841b7afc122f1-Paper.pdf

[4] Total solar irradiance during the Holocene https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2009GL040142

F. Contact

Maintained by Jakob Weissteiner (weissteiner), Hanna Wutte (HannaSW), Jakob Heiss (JakobHeiss) and Marius Högger (mhoegger).