ucb
An upper confidence bounds algorithm for multi-armed bandit problems
This implementation is based on Bandit Algorithms for Website Optimization and related empirical research in "Algorithms for the multi-armed bandit problem". In addition, this module conforms to the BanditLab/2.0 specification. Now written in Typescript!
Get started
Prerequisites
- Node.js 6.x+ (LTS track)
- npm
Installing
Install with npm
(or yarn
):
npm install ucb --save
Caveat emptor
This implementation often encounters extended floating point numbers. Arm selection is therefore subject to JavaScript's floating point precision limitations. For general information about floating point issues see the floating point guide.
Usage
-
Create an optimizer with
3
arms:const { Ucb } = require('ucb'); const ucb = new Ucb({ arms: 3 });
-
Select an arm (exploits or explores, determined by ucb):
ucb.select().then((arm) => { // do something based on the chosen arm }); // or const arm = ucb.selectSync();
-
Report the reward earned from a chosen arm:
ucb.reward(arm, value);
API
Ucb(config)
Creates a new Ucb.
Arguments
config
(Object
): ucb instance parameters
The config
object supports two optional parameters:
arms
(Number
, Integer): The number of arms over which the optimization will operate; defaults to2
Alternatively, the state
object resolved from Ucb#serialize
can be passed as config
.
Returns
An instance of Ucb.
Example
const { Ucb } = require('ucb');
const ucb = new Ucb();
assert.equal(ucb.arms, 2);
Or, with a passed config
:
const { Ucb } = require('ucb');
const ucb = new Ucb({ arms: 4 });
assert.equal(ucb.arms, 4);
Ucb#select()
Choose an arm to play, according to Ucb.
Arguments
None
Returns
A Promise
that resolves to a Number
corresponding to the associated arm index.
Example
const { Ucb } = require('ucb');
const ucb = new Ucb();
ucb.select().then(arm => console.log(arm));
// or
const arm = ucb.selectSync();
Ucb#reward(arm, reward)
Inform Ucb about the payoff from a given arm.
Arguments
arm
(Number
, Integer): the arm index (provided fromUcb#select()
)reward
(Number
): the observed reward value (which can be 0 to indicate no reward)
Returns
A Promise
that resolves to an updated instance of ucb. (The original instance is mutated as well.)
Example
const { Ucb } = require('ucb');
const ucb = new Ucb();
ucb.reward(0, 1).then(updatedUcb => console.log(updatedUcb));
// or
const updatedUcb = ucb.rewardSync(0, 1);
console.log(updatedUcb);
Ucb#serialize()
Obtain a plain object representing the internal state of ucb.
Arguments
None
Returns
A Promise
that resolves to a stringify-able Object
with parameters needed to reconstruct ucb state.
Example
const { Ucb } = require('ucb');
const ucb = new Ucb();
ucb.serialize().then(state => console.log(state));
// or
const state = ucb.serializeSync();
console.log(state);
Development
Contribute
PRs are welcome! For bugs, please include a failing test which passes when your PR is applied. Travis CI provides on-demand testing for commits and pull requests.
Workflow
- Feature development and bug fixing should occur on a non-master branch.
- Changes should be submitted to master via a Pull Request.
- Pull Requests should be merged via a merge commit. Local "in-process" commits may be squashed prior to pushing to the remote feature branch.
Tests
To run the unit test suite:
npm test
Or, to run the test suite and view test coverage:
npm run coverage