PeeteKeesel / reinforce-py

🐍 Implementation of the REINFORCEjs library from Kaparthy in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ€– REINFORCEpy

Implementation of the REINFORCEjs library from Kaparthy in Python. The original library has been implemented in JavaScript. The objective of this repository is to implement the RL algorithms and the demos in Python.

Note that this is not a 1-to-1 implementation in Python. The idea is simply trying to develop similar algorithms and demos as shown in Kaparthy's library.

Value Iteration

We started by implemented the most trivial algorithm, Value Iteration, from scratch.

The following shows an example of the value function for different iterations.

After 1 iterations After 100 iterations
Value function after $1$ iteration Value function after $100$ iteration

πŸƒ How to Run?

There are multiple parameters which can be chosen to set when running the main.py. An example call would look like this:

python main.py \
    --seed=42 \
    --verbose=1 \
    --episodes=1 \
    --timesteps=1 \
    --grid_size=10 \
    --algo=value_iteration \
    --render_large=True \
    --render_with_values=True

All supported arguments are listed below:

usage: 
  main.py [--seed] [--verbose] [--episodes] [--timesteps] [--grid_size] [--algo] 
          [--render_large] [--render_with_values]
Argument Help Default
--seed random seed $42$
--verbose verbosity level $1$
--episodes number of episodes $1$
--timesteps maximal number of timesteps $1,000$
--grid_size size of the gridworld $10$
--algo learning algorithm value_iteration
--render_large render large gridworld False
--render_with_values render gridworld with value estimates False

πŸ“ ToDo's

Added to docs/changelog.md

About

🐍 Implementation of the REINFORCEjs library from Kaparthy in Python

License:MIT License


Languages

Language:Jupyter Notebook 80.9%Language:Python 19.1%