asacalow / udacity-drl-p1

Solution to Project 1 (Navigation) for Udacity's Deep Reinforcement Learning Nano-degree

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

README

Here be my solution to the first Navigation Project in Udacity's Deep Reinforcement Learning course.

Project Details

The environment to solve is the Banana Collector for the p1_navigation project, described in full in its README here. For convenience, the relevant infos on the state and action spaces are reproduced below:

A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of your agent is to collect as many yellow bananas as possible while avoiding blue bananas.

The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction. Given this information, the agent has to learn how to best select actions. Four discrete actions are available, corresponding to:

0 - move forward. 1 - move backward. 2 - turn left. 3 - turn right.

The task is episodic, and in order to solve the environment, your agent must get an average score of +13 over 100 consecutive episodes.

Getting Started

The only additional dependency for this solution, beyond the standard project dependencies (as described in the main Udacity course repo) is Pytorch's Ignite framework, which can be installed via either pip or conda as described here.

Training the agent

Training the agent is achieved by running Reinforce-Unity-PER-Dueling.ipynb in this repo. There are a couple more implementations – one using plain Double DQN, and another frankly terrible one which applies both Double DQN and PER. Ignore these and go straight for the notebook linked above – this is where you'll find the (reasonably) good stuff. You can find a full report here - I reached the solution in a non-awful 309 episodes.

About

Solution to Project 1 (Navigation) for Udacity's Deep Reinforcement Learning Nano-degree


Languages

Language:Jupyter Notebook 87.3%Language:Python 12.7%