doctorcorral/DRLND-p2-continuous

Continuous control for DRLND Udacity Nanodegree

This is the second project in the Udacity Deep Reinforcement Learning Nanodegree. The task is to train a double-jointed arm to target locations.

The environment implementation to train on is Reacher

Project Details

In this environment, a double-jointed arm can move to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. It is, the arm must follow the target as much as possible.

The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1.

Getting Started

Follow the Udacity DRL ND dependencies instructions here

Be sure of install Unity ML-Agents, NumPy and PyTorch

Download a prebuilt simulator

Singe agent:

Linux: click here Mac OSX: click here Windows (64-bit): click here

Twenty Agents:

Linux: click here Mac OSX: click here Windows (64-bit): click here

Place this file in the same directory as this repo content.

Instructions

Run the DDPG_Continuous_Control.ipynb notebook using the drlnd kernel to train the DDPG agent.

Use ddpg function to perform the training. This function returns a dictionary containing relevant internal variables that can be fed again to this function to continue training where it was left. Play around with this, change hyper parameters in between training runs to train your own intuition.

This problem is said to be solved when you hit a 30 score (see an example Reward Plot).

Once trained the model weights will be saved in the same directory in the files checkpoint_actor.pth and checkpint_critic.pth.

The model weights are used by the run_agent.ipynb notebook against the simulator.

doctorcorral / DRLND-p2-continuous