BOE

Bandit Observation Engine: simulating context-free bandits.

Running Experiment

First, create a YAML file similar to egreedy-basics.yaml.

To see run options, including param names, type:

python -m boe.experimenting --help

Note that contents of the py directory should be included in PYTHONPATH.

Features

Delayed reward (initial delay)
Periodic update (periodic delay)
Periodic snapshot of bandit state
HTML report on experiment

Supported algorithms

Reported Metrics

There are four reported metrics, which are saved at each snapshot:

Probability of selection per arm
Average reward per arm
Cumulative reward per arm
Global cumulative reward

Average reward per arm: Upper Confidence Bound

There is an option of reporting the Upper Confidence Bound (UCB) along with the average reward per arm. The parameter report-ucb defines this behavior.

When UCB is reported, there is an option to discount the value, linearly. This is strictly for plotting purposes. UCB is relative measure, and as such, should be used to estimate the confidence of knowledge of one arm over other arms. Not as an absolute measure.

The computed UCB at each snapshot is stored as is.

Environment

Use python3.

About

Bandit Observation Engine: simulating context-free bandits.

Languages

Language:Python 96.5%Language:Shell 1.9%Language:CSS 1.6%