Implementation of Asynchronous Advantage Actor-Critic algorithm using Long Short Term Memory Networks (A3C-LSTM)

Modified from the work of Arthur Juliani: Simple Reinforcement Learning with Tensorflow Part 8: Asynchronous Actor-Critic Agents (A3C)

Paper can be found here: "Asynchronous Methods for Deep Reinforcement Learning" - Mnih et al., 2016

Tested on CartPole

Requirements

Gym and TensorFlow.

Usage

Training only happens on minibatches of greater than 30, effectively preventing poor performing episodes from influencing training. A reward factor is used to allow for effective training at faster learning rates.

Models are saved every 100 episodes. They can be reloaded for further training or visualised for testing by setting either of the global parameters to True.

This is just example code to test an A3C-LSTM implementation. This should not be considered the optimal way to learn for this environment!

About

A3C-LSTM algorithm tested on CartPole OpenAI Gym environment

Languages

Language:Python 100.0%