Banana Collector

Unity's Banana Collector Environment is an environment in which an agent must collect as many yellow bananas (+1) as possible while avoiding blue bananas (-1).

The agent interacts with the environment via the following:

It is fed observations of the current state via a vector of 37 elements
It can choose to make any of 4 actions: (Left, Forward, Right or Back)

Sample image taken from: https://github.com/udacity/deep-reinforcement-learning/tree/master/p1_navigation

This repository trains an agent to attain an average score (over 100 episodes) of at least 13. It trains the agent using the Double DQN Reinforcement Learning algorithm.

Prerequisites

Anaconda
Python 3.6

A conda environment created as follows

Linux or Mac:

conda create --name drlnd python=3.6
source activate drlnd

Windows

conda create --name drlnd python=3.6 
activate drlnd

Required dependencies

git clone https://github.com/udacity/deep-reinforcement-learning.git
cd deep-reinforcement-learning/python
pip install .

Getting Started

git clone https://github.com/JoshVarty/BananaCollector_DoubleQLearning.git
cd BananaCollector_DoubleQLearning
Download Unity Banana Collector Enviroment:
- Linux: click here
- Linux Headless: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
Unzip to git directory
jupyter notebook
You can train your own agent via DQN.ipynb or watch a single episode of the pre-trained network via Visualization.ipynb

Results

In my experience the agent achieves an average score of 13 after ~400 episodes of training:

A sample run generated from Visualization.ipynb

Notes

Only tested on Ubuntu 18.04
Details of the learning algorithm and chosen architecture may be found in Report.md

About

An implementation of DoubleQLearning to solve Unity's Banana Environment

MIT License

Languages

Language:Jupyter Notebook 86.5%Language:Python 13.5%