All code was ran on an Ubuntu 22.04.3 LTS machine with Python version 3.10.12. The exact requirements with versions can be found in requirements.txt
.
- Parts of the code are inspired by the POUCT + particles implementation of pomdp-py, and this fork David Silver's POMCP code, as the original is not available anymore.
- Credit is also due for some of the environments. Some of the code of the FFG environment comes from the implementation in MADP. MARS is a Python variant of this MARS environment, and CaptureTarget; which was built from the code of ROLA.
Install basic requirements via pip install -r requirements.py
Below is an example of how all experiments can be run with the convenience script.
python3 venv .venv
source .venv/bin/activate
python3 -m pip install -U pip setuptools wheel
python3 -m pip install -r requirements.py
bash run_exp.bash
After running the above commands, or any of the individual runs defined in run_exp.bash
, the outputs will be in experiments/benchmarks/$ID/$SEED
, where the seed $SEED is 1337 by default and $ID is set by the --id
parameter when starting the run. The outputs consists of two .csv
files per experiment, one with discounted and one with undiscounted rewards, a params.json
containing parameters, and a large serialization of the results dictionary built during the experiment; named results.pickle
. This pickle file can be loaded for conveniently postprocessing the results for the paper, for example as we do in the Jupyter notebook file plot_results_final.ipynb
.
An individual example taken from the run_exp.bash
is as follows:
python3 run_experiments.py fff --max_time 5 --episodes 100 --id fff_exp --multi 34 | tee fff.log
This will produce outputs, namely the results pickle, to experiments/benchmarks/fff_exp/1337/
. In the notebook (notebooks/plot_results_final.ipynb
), one can then replace the instances of dirname
with experiments/benchmarks/fff_exp/1337/results.pickle
appended to your working directory, or e.g., ../experiments/benchmarks/fff_exp/1337/results.pickle
when executing the notebook from the same base folder. The notebook contains separate Markdown headings for producing the results of each experiment.
The file run_experiments.py
is the main starting point for reproducing the experiments in the paper. It controls episode_runner.py
, which is the main entrypoint to run a certain number of episodes on an environment.
The run_experiments.py
file can be started with a few parameters, see experiment_helper for the commands used for the paper. Keep in mind this starts 34 threads each time episode_runner.py
is called by default (since the argument --multi 34
is passed).
See the --help
output of episode_runner.py
to run specific instances. It's copied below for convenience.
usage: episode_runner.py [-h] [--random] [--joint] [--num_agents NUM_AGENTS] [--horizon HORIZON] [--action_coordination {ve,mp}] [--num_episodes NUM_EPISODES] [--num_sims NUM_SIMS] [--max_time MAX_TIME] [--exploration_const EXPLORATION_CONST] [--discount DISCOUNT] [--no_particles]
[--num_particles NUM_PARTICLES] [--max_depth MAX_DEPTH] [--dont_reuse_trees] [--mmdp] [--progressive_widening] [--likelihood_sampling] [--weighted_particle_filtering] [--factored_statistics] [--pft] [--use_sim_particles] [--smosh_errors] [--rand_errors] [--save]
[--multithreaded [PERIOD]] [--seed SEED] [--id ID] [--store_results] [--render]
env [experiment_names ...]
positional arguments:
env
experiment_names (Optional) give the function identifier of any experiment to run that is available in this file. E.g. `run_vanilla_pomcp`.
options:
-h, --help show this help message and exit
--random Use random policy, e.g. for baseline result.
--joint Run experiment using joint action and observation space, as in vanilla POMCP/Sparse-PFT.
--num_agents NUM_AGENTS, --n NUM_AGENTS
--horizon HORIZON, --h HORIZON
--action_coordination {ve,mp}
--num_episodes NUM_EPISODES, --episodes NUM_EPISODES
Number of episodes to run.
--num_sims NUM_SIMS, --sims NUM_SIMS
Maximum number of simulation function calls in the tree search.
--max_time MAX_TIME, --time MAX_TIME
Maximum time spent in the tree search in seconds.
--exploration_const EXPLORATION_CONST, --c EXPLORATION_CONST
UCB1 exploration constant c.
--discount DISCOUNT, --gamma DISCOUNT
Discount factor in floats (should meet 0 <= y <= 1).
--no_particles Do not use particle filters. The fallback is to run with POUCT, i.e. with a belief distribution, which might not be implemented for every environment.
--num_particles NUM_PARTICLES, --np NUM_PARTICLES, --p NUM_PARTICLES
Specify the number of particles in each factored filter or in the joint filter, depending on the algorithm set-up.
--max_depth MAX_DEPTH
Maximum depth of the tree.
--dont_reuse_trees Rebuild tree every step in the episode, not making use of previous tree search results.
--mmdp Run in MMDP setting. Meaning: pick the true state of the environment in every simulation call instead of sampling from the belief.
--progressive_widening, --dpw
Add factored progressive widening to the tree search algorithm to increase depth of the search. Might negatively influence results.
--likelihood_sampling, --ls
Belief Likelihood-based asymmetric sampling.
--weighted_particle_filtering, --weighted, --wpf
Use weighted particle filtering, assumes and requires an explicit observation model.
--factored_statistics, --fs
Factored statistics / value version of the algorithm. Use with --joint only.
--pft Use the (factored-trees) Particle Filter Tree algorithm.
--use_sim_particles Merge the updated belief and simulation particles.
--smosh_errors Ignore exceptions during multithreading and keep executing the remaining episodes.
--rand_errors Ignore particle filter exceptions during searching and keep executing the remaining episode with a random policy.
--save Save intermediate results to disk for debugging. Might not work when running multithreaded.
--multithreaded [PERIOD], --multi [PERIOD]
Run episodes multithreaded, every episode runs in its own process. Maximum number of processes is half the number of CPU threads by default but can be supplied.
--seed SEED, --s SEED
--id ID Experiment identifier, determines which directory the results are stored to.
--store_results Store the benchmark results in a CSV.
--render