Train a Unity Environment (Reacher) using Deep Deterministic Policy Gradient

Introduction

For this project, 20 double-jointed arms are trained to move to target locations.

A reward of +0.1 is provided for each step if the agent's hand is on the target location. Thus, the agent goal is to maintain its position at the target location for as many time steps as possible. The environment is considered solved when the trained agents achieves an average score of +30 over 100 consecutive episodes (where the average is over all 20 agents).

The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1. More details can be found here.

Distributed Training

Each agent acts independently on the environment. This concept is useful for algorithms like PPO, A3C, and D4PG that use multiple (non-interacting, parallel) copies of the same agent to distribute the task of gathering experience.

Getting Started

Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.

(For AWS) If you'd like to train the agent on AWS (and have not enabled a virtual screen), then please use this link to obtain the "headless" version of the environment. You will not be able to watch the agent without enabling a virtual screen, but you will be able to train the agent. (To watch the agent, you should follow the instructions to enable a virtual screen, and then download the environment for the Linux operating system above.)
Clone the Udacity_RL_P2_CControl GitHub repository, place the file in the folder and decompress it.
Create a virtual environment and install the required libraries. For OSX users, you can use the MakeFile included in the repo. The option make all will create a new venv called Udacity_RL_P2 and install the relevant dependencies to execute the notebook.
Activate the virtual environment using source ./Udacity_RL_P2/bin/activate
Type jupyter lab and select Udacity_RL_P2 kernel.

Train and execute the model

Within the virtual environment you can train and evaluate the model using python main.py. By default, the script will load the environment and evaluate a pre-trained model. If you want to retrain the model, set TRAIN = True in main.py and then run the script.

You can also use the notebook Continuous_Control.ipynb to train and evaluate the model. Set the flag train to TRUE to re-train the model. Further details can be found here

nlddfn / Udacity_RL_P2_CControl

Train a Unity Environment (Reacher) using Deep Deterministic Policy Gradient

Introduction

Distributed Training

Getting Started

Train and execute the model

About

Languages