wangkaiwan / DeepRL

Combining deep learning and reinforcement learning.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deep Reinforcement Learning

Codacy Badge

This code is part of my master thesis at the VUB, Brussels.

Status

Different algorithms have already been implemented:

Sarsa + function apprixmation

The following parts are combined to learn to act in the Mountain Car environment:

  • Sarsa
  • Eligibility traces
  • epsilon-greedy action selection policy
  • Function approximation using tile coding

Example of a run after training with a total greedy action selection policy for 729 episodes of each 200 steps: Example run

Total reward per episode: Total reward per episode

Note that, after a few thousand episodes, the algorithm still isn't capable of consistently reaching the goal in less than 200 steps.

REINFORCE

Adapted version of this code in order to work with Tensorflow. Total reward per episode when applying this algorithm on the CartPole-v0 environment: Total reward per episode using REINFORCE

Karpathy Policy Gradient

Adapted version of the code of this article of Andrej Karpathy. Total reward per episode when applying this algorithm on the CartPole-v0 environment: Total reward per episode using Karpathy

How quickly the optimal reward is reached and kept heavily varies however because of randomness. Results of an earlier execution are also posted on the OpenAI Gym.

Advantage Actor Critic

Total reward per episode when applying this algorithm on the CartPole-v0 environment: Total reward per episode using A2C

OpenAI Gym page

Asynchronous Advantage Actor Critic

Total reward per episode when applying this algorithm on the CartPole-v0 environment: Total reward per episode using A3C

This only shows the results of one of the A3C threads. Results of another execution are also posted on the OpenAI Gym. Results of an execution using the Acrobot-v1 environment can also be found on OpenAI Gym.

How to run

First, install the requirements using pip:

pip install -r requirements.txt

Then you can run the Sarsa + Function approximation algorithm using:

python SarsaFA.py <episodes_to_run> <monitor_target_directory>

You can run the CEM, REINFORCE, Karpathy, Karpathy_CNN, A2C and A3C algorithm using:

python <algorithm_name>.py <environment_name> <monitor_target_directory>

You can plot the episode lengths and total reward per episode graphs using:

python plot_statistics.py <path_to_stats.json> <moving_average_window>

About

Combining deep learning and reinforcement learning.

License:MIT License


Languages

Language:Python 100.0%