Actor-Critic

The A2C Reinforcement Learning Method

Introduction

This project contains an implementation of the Advantage Actor-Critic Reinforcement Learning Method, and includes an example on Cart-Pole. Cart-Pole is a game in which the player (in this case, our agent) attempts to balance a pole on a cart. At each time step, the player can either accelerate the cart left or right uniformally. An episode of the game is lost if the pole falls + or - 15 degrees from vertical, and it is won if the player survives 200 time steps. In order to be considered a solution, an agent must survive an average of 195+ time steps over 100+ episodes.

Results

Side-by-side comparison of random agent (takes random actions) and trained A2C agent:

Rewards at each episode for 4 seperate trials:

Training can be quite unstable, even with extensive hyperparameter tuning

Implementation Details

Open AI Gym provides a variety of Reinforcement Learning Environments: https://gym.openai.com/envs/

Their CartPole-v0 env was used for this project

At each time step, the agent provides an action to the environment and the environment provides an observation and a reward. In the case of Cart-Pole the reward at each time step is 1, such that the total reward for each episode depends on how long the agent survives the game. An observation is an array consisting of the following: (cart position, cart velocity, pole angle, pole rotation rate).

This implementation of A2C uses two neural networks:

Actor: takes in an observation as input and outputs action probabilities

self.actor = nn.Sequential(
           nn.Linear(4, 128),
           nn.ReLU(),
           nn.Linear(128, 2)
       ).double()

Critic: takes in an observation and outputs a value which estimates the expected return at the current state

self.critic = nn.Sequential(
           nn.Linear(4, 128),
           nn.ReLU(),
           nn.Linear(128, 1)
       ).double()

Note: The above code creates network architectures for Cart-Pole, however the actual module in src/a2c.py infers the input and output dimensions and thus can be used for any OpenAI Gym Env

Built With

Installation and Running Scripts

Clone the repo and change into directory

$ git clone https://github.com/Lucasc-99/Actor-Critic.git
$ cd Actor-Critic

Install Pytorch and Gym
```
$ pip3 install torch
$ pip3 install gym
```

Run scripts

$ python3 -m src.cart-pole-baseline.py
$ python3 -m src.cart-pole-a2c.py

zsychina / Actor-Critic