reacher-ddpg

A random agent	A trained agent

Examples of a random agent and an agent trained to position its arm within the (green) target region.

The Model and Environment

This project implements Deep Deterministic Policy Gradient (DDPG) [1] to solve the "Reacher" Unity ML Environment. The goal in this environment is to train an agent that's capable of maneuvering a robotic arm such that its hand is located within the target region (seen as a green sphere in the GIFs above). This project was completed as part of the (Unity-sponsored) Udacity course on Deep Reinforcement Learning.

Observation Space

33 real-valued variables describing the position, rotation, velocity, and angular velocities of the arm.

Action Space

A float vector of size 4. The arm contains 2 joints, and each joint can be moved with 2 torque values. (Every entry is bound between [-1, +1].)

Reward Function

The agent receives a reward of +0.1 for every time step the hand is within the desired region.

Solved

The environment is considered solved when the agent can receive an average reward of +30 or more, over a window of 100 episodes.

Setup

Code Setup

Based on the Udacity setup (see here), Conda/virtualenv can be used to install the required dependencies. For example:

virtualenv --python=python3 venv
source venv/bin/activate
pip install -r requirements.txt

Environment Setup

The environment executable can be downloaded for different platforms.

Project Structure

`Report.md`

A document detailing some of the implementation details and ideas for future work.

`main.py`

Contains a CLI used for training and visualizing the model.

`ddpg/`

The main module.

`models.py`

Contains definitions of the Policy/Q-Net (Actor/Critic) models.

`runner.py`

Contains functions for training and visualizing agents.

`utils.py`

Various utilities for managing the environment and training loop.

`models/`

Contains pretrained models.

`policy_net.pth`

A pre-trained Policy Net (Actor).

`q_net.pth`

A pre-trained Q-Net (Critic).

Training and Visualizing

The main.py script can be used to train agents and visualize them.

To train:

python3 main.py train

To visualize:

python3 main.py visualize models/policy_net.pth

References

Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver and Daan Wierstra. “Continuous control with deep reinforcement learning.” CoRR abs/1509.02971 (2016): n. pag.

jstol / reacher-ddpg