Exploring validation metrics for offline model-based optimisation with diffusion models

Installation

Please see INSTALL.org.

When this is done, cd into exps and run cp env.sh.bak env.sh and modify env.sh to define the following environment variables. Here is an example:

export SAVEDIR=<path to store experiment checkpoints>
export MUJOCO_PY_MUJOCO_PATH=~/bin/mujoco/mujoco210
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MUJOCO_PY_MUJOCO_PATH}/bin:/usr/lib/nvidia

Experiments

The exps folder is is where experiments are launched from, and experiments are launched by invoking main.sh. Its usage is as follows:

# run source env.sh before running main.sh
bash main.sh <cls|sbgm> <experiment name> <path to json file>

Let’s break this down from left to right:

cls for training a classifier (i.e. the oracle regression model), sbgm for diffusion model.
<experiment name> is the name of the experiment, whose directory of results and checkpoints will be stored in $SAVEDIR/<experiment name>/<experiment id>. The experiment ID by default is taken from $SLURM_JOB_ID (because that is what I use internally), but in a non-Slurm environment this will be undefined. Therefore the script will replace it with Unix time. (If this is not desirable you can modify the script to do something else.)
Experiments are defined not through a messy string of argparse arguments but instead via a JSON dictionary. You can see all the possible options by viewing the respective dataclass in trainval.py.

By default, this script will copy the code to $SAVEDIR/<experiment name>/<experiment id>/code and cd into that directory to run the experiment. To avoid this simply set RUN_LOCAL=1.

Configs

Due to large file sizes I have only included the pretrained checkpoints for the oracles. These are needed to run the experiments defined in exps/json. First download the checkpoints from here and extract their contents to the save directory defined in env.sh (i.e. $savedir).

The following experiment configurations are available in exps/json:

.
├── ant
│   ├── train-diffusion-cfg.json
│   ├── train-diffusion-cg.json
│   ├── train-training-oracle.json
│   └── train-validation-oracle.json
├── hopper50
│   └── train-diffusion-cfg.json
├── kitty
│   ├── train-diffusion-cfg.json
│   ├── train-diffusion-cg.json
│   ├── train-training-oracle.json
│   └── train-validation-oracle.json
└── sd
    ├── HYPERPARAMETERS.org
    ├── train-diffusion-cfg.json
    ├── train-diffusion-cg.json
    ├── train-training-oracle.json
    └── train-validation-oracle.json

What the filenames signify:

train-training-oracle.json: only concerns classifier-based guidance experiments, this is an approximate oracle trained only on the training set. However, it is also trained on the same forward distribution as the diffusion models q(x0, ..., xT). In other words, they learn a regression model p(y|x_t;t) where x_t is the noised input at timestep t, and y is the reward variable.
train-validation-oracle.json: this is the validation set oracle, which is trained on both the training + validation sets. This is also the oracle which is used for the validation metrics.
train-diffusion-cfg.json: train diffusion model with classifier-free guidance. The diffusion model simultaneously learns both a noise predictor conditioned on y and without, and this can be used to derive a conditional score function.
train-diffusion-cg.json: train diffusion model with classifier-based guidance. This learns an unconditional score function which, when combined with the training oracle (see train-training-oracle) defines a conditional score function.

Evaluation

TODO.

Misc