jia-yi-chen / Bandit-and-Reinforcement-Learning

Python implementation for Reinforcement Learning algorithms -- Bandit algorithms, MDP, Dynamic Programming (value/policy iteration), Model-free Control (off-policy Monte Carlo, Q-learning)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reinforcement Learning

Author: Jiayi Chen

Time: Oct 2020

Implemented Algorithms:

  • Bandit:
    • Multi-arm Bandit:
      • epsilon-greedy
      • upper confidence bound (UCB)
      • Thompsom Sampling (TS)
      • Perturbed-history Exploration (PHE)
    • Contextual Linear Bandit:
      • LinUCB
      • LinTS
      • LinPHE
  • Reinforcement Learning:
    • Dynamic programming solution for Markov Decision Process (known environment):
      • value iteration
      • policy iteration
    • Model-free control:
      • off-policy Monte Carlo (MC) control
      • off-policy Temporal Difference (TD) control (i.e., Q-learning)

Requirements

  • python 3

Getting Started

Bandit

Simulation environment:

  • Action: articles
  • User: users
  • In each time step, we will iterate over each user, make recommendation to it and receive an reward of the recommended article.
run "/bandit/SimulationComparison.py"

See "/bandit/lib/$ALGOTHISNAME$.py" for each algorithm.

Reinforcement Learning

Simulation environment:

4-by-4 grid world. The goal of the agent is to get to the goal (cell grid[3][3]) as soon as possible, while avoid the pits (cell grid[1][1] and grid[2][1]).

Dynamic programming for Markov Decision Process:

run "/rl/runDP.py"

Model-free MC/TD control:

run "/rl/runRL.py"

About

Python implementation for Reinforcement Learning algorithms -- Bandit algorithms, MDP, Dynamic Programming (value/policy iteration), Model-free Control (off-policy Monte Carlo, Q-learning)


Languages

Language:Python 100.0%