deep-q-learning dynamic-programming gridworld policy-evaluation policy-iteration q-learning reinforce reinforcement-learning value-iteration

🤖 REINFORCEpy

Implementation of the REINFORCEjs library from Kaparthy in Python. The original library has been implemented in JavaScript. The objective of this repository is to implement the RL algorithms and the demos in Python.

Note that this is not a 1-to-1 implementation in Python. The idea is simply trying to develop similar algorithms and demos as shown in Kaparthy's library.

Value Iteration

We started by implemented the most trivial algorithm, Value Iteration, from scratch.

The following shows an example of the value function for different iterations.


Value function after $1$ iteration	Value function after $100$ iteration

🏃 How to Run?

There are multiple parameters which can be chosen to set when running the main.py. An example call would look like this:

python main.py \
    --seed=42 \
    --verbose=1 \
    --episodes=1 \
    --timesteps=1 \
    --grid_size=10 \
    --algo=value_iteration \
    --render_large=True \
    --render_with_values=True

All supported arguments are listed below:

usage: 
  main.py [--seed] [--verbose] [--episodes] [--timesteps] [--grid_size] [--algo] 
          [--render_large] [--render_with_values]

Argument	Help	Default
`--seed`	random seed	$42$
`--verbose`	verbosity level	$1$
`--episodes`	number of episodes	$1$
`--timesteps`	maximal number of timesteps	$1,000$
`--grid_size`	size of the gridworld	$10$
`--algo`	learning algorithm	`value_iteration`
`--render_large`	render large gridworld	`False`
`--render_with_values`	render gridworld with value estimates	`False`

📝 ToDo's

Added to docs/changelog.md

About

🐍 Implementation of the REINFORCEjs library from Kaparthy in Python

deep-q-learning dynamic-programming gridworld policy-evaluation policy-iteration q-learning reinforce reinforcement-learning value-iteration

MIT License

Languages

Language:Jupyter Notebook 80.9%Language:Python 19.1%