KaleabTessera / Multi-Armed-Bandit

Implementation of greedy, E-greedy and Upper Confidence Bound (UCB) algorithm on the Multi-Armed-Bandit problem.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-Armed-Bandit

Description

This is an implementation of $\epsilon$-Greedy, Greedy and Upper Confidence Bound algorithms to solve the Multi-Armed Bandit problem. Implementation details of these algorithms can be found in Chapter 2 of Reinforcement Learning: An Introduction - Rich Sutton

How to Install:

# In project root folder
pip install -r requirements.txt

How to Run:

# In project root folder
./run.sh

Tasks

Part 1

A plot of reward over time (averaged over 100 runs each) on the same axes, for $\epsilon$-greedy with ๐œ– = 0.1, greedy with ๐‘„1 = 5, and UCB with ๐‘ = 2. Part1

Part 2

A summary comparison plot of rewards over first 1000 steps for the three algorithms with different values of the hyperparameters. Part2

About

Implementation of greedy, E-greedy and Upper Confidence Bound (UCB) algorithm on the Multi-Armed-Bandit problem.


Languages

Language:Python 99.9%Language:Shell 0.1%