Thank you for taking the time to visit the Supplementary Materials for the paper "Bounding the Optimal Value Function in Compositional Reinforcement Learning".
In this folder you will find the Code used to generate the results in the paper, as well as the data ("export" folder) used to generate the figures in the paper.
The optimal hyperparameters (as stated in the Appendix) for each clip method are in the dqn_sweep.py file.
- Reconstruct the reward plot the results with
python plot.py
- Reconstruct the bound violation plot the results with
python plot_BV.py
You'll need a Weights and Biases account to run sweeps.
- Pretrain the DQN models on primitive tasks with
python pretrain.py
- Run a sweep with optimal hyperparameters with
python dqn_sweep.py
- Export the results with
python export.py --entity <your wandb username>
Following can be tested and viewed through tensorboard locally:
Start tensorboard with tensorboard --logdir=tmp
from the Code/ dir.
View the dashboard with browser at localhost:6006
train SB3 DQN on custom maze 6x6L_AND_D
in 200k steps
python dqn_baseline.py
train SB3 DQN on custom maze 6x6L_AND_D
in 200k steps. Requires two
pretrained models, 6x6L
and 6x6D
in models/
dir. Run python pretrain.py
to generate the two models.
python dqn_composed.py --env 6x6L_AND_D --clipmethod none --comptype and
TabularExperiments/Videos/maxent_4room_sweep.gif
TabularExperiments/Videos/std_4room_sweep.gif
In the TabularExperiments folder, there is a frozen_lake_tests.py file with options in the main
function that can be used to run the tabular experiments.