leculette / RL

R.L. methods and techniques.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reinforcement Learning

License: MIT Python 3.5+

Installation (near future)

pip install mypyrl

Overview

This repository contains code that implements algorithms and models from Sutton's book on reinforcement learning. The book, titled "Reinforcement Learning: An Introduction," is a classic text on the subject and provides a comprehensive introduction to the field.

The code in this repository is organized into several modules, each of which covers differents topics.

Multi Armed Bandits

  • Epsilon Greedy
  • Optimistic Initial Values
  • Gradient
  • α (non stationary)

Tabular Solution Methods

Model Based methods

  • Policy Evaluation
  • Policy Iteration
  • Value Iteration

Model Free methods

  • Monte Carlo estimation and control
    • First-visit α-MC
    • Every-visit α-MC
    • MC with Exploring Starts
    • Off-policy MC, ordinary and weighted importance sampling
  • Temporal Difference
    • TD(n) estimation
    • n-step SARSA
    • n-step Q-learning
    • n-step Expected SARSA
    • double Q learning

Planning and Learning Methods

  • Dyna-Q/Dyna-Q+
  • Prioritized Sweeping
  • MCTS
  • Trajectory Sampling
  • RTDP

Approximate Solution Methods

  • On-policy Prediction
    • Linear SGD/semi-SGD
    • ANN
    • Least-Squares TD
    • Memory-based
    • Kernel-based
  • On-policy Control
    • Episodic semi-gradient
    • Semi-gradient n-step Sarsa
    • Differential Semi-gradient n-step Sarsa

All solvers will work just by defining states actions and a trasition function. Transitions are defined as a function that takes a state and an action and returns a tuple of the next state and the reward. The transition function also returns a boolean indicating whether the episode has terminated.

states: Sequence[Any]
actions: Sequence[Any]
transtion: Callable[[Any, Any], Tuple[Tuple[Any, float], bool]]

Examples

Single State Infinite Variance Example 5.5

from mypyrl import off_policy_mc, ModelFreePolicy

states = [0]
actions = ['left', 'right']

def single_state_transition(state, action):
    if action == 'right':
        return (state, 0), True
    if action == 'left':
        threshold = np.random.random()
        if threshold > 0.9:
            return (state, 1), True
        else:
            return (state, 0), False

b = ModelFreePolicy(actions, states) #by default 1 half
pi = ModelFreePolicy(actions, states)
pi.pi[0] = np.array([1, 0])

# calculate ordinary and weighted samples state value functions
vqpi_ord, samples_ord = off_policy_mc(states, actions, single_state_transition,
    policy=pi, b=b, ordinary=True, first_visit=True, gamma=1., n_episodes=1E4)

vqpi_w, samples_w = off_policy_mc(states, actions, single_state_transition, 
    policy=pi, b=b, ordinary=False, first_visit=True, gamma=1., n_episodes=1E4)

Contributing

While the code in this package provides a basic implementation of the algorithms from the book, it is not necessarily the most efficient or well-written. If you have suggestions for improving the code, please feel free to open an issue.

In addition to the code, there are useful Jupyter notebooks here that provide examples of how to use the implemented algorithms and models. Notebooks are usually implementations of examples present on suttons book.

Overall, this package provides a valuable resource for anyone interested in learning about reinforcement learning and implementing algorithms from scratch. By no means prod ready.

About

R.L. methods and techniques.


Languages

Language:Python 100.0%