Unity's Banana Collector Environment is an environment in which an agent must collect as many yellow bananas (+1) as possible while avoiding blue bananas (-1).
The agent interacts with the environment via the following:
- It is fed observations of the current state via a vector of 37 elements
- It can choose to make any of 4 actions: (Left, Forward, Right or Back)
Sample image taken from: https://github.com/udacity/deep-reinforcement-learning/tree/master/p1_navigation
This repository trains an agent to attain an average score (over 100 episodes) of at least 13. It trains the agent using the Double DQN Reinforcement Learning algorithm.
-
Anaconda
-
Python 3.6
-
A
conda
environment created as follows- Linux or Mac:
conda create --name drlnd python=3.6 source activate drlnd
- Windows
conda create --name drlnd python=3.6 activate drlnd
-
Required dependencies
git clone https://github.com/udacity/deep-reinforcement-learning.git
cd deep-reinforcement-learning/python
pip install .
-
git clone https://github.com/JoshVarty/BananaCollector_DoubleQLearning.git
-
cd BananaCollector_DoubleQLearning
-
Download Unity Banana Collector Enviroment:
- Linux: click here
- Linux Headless: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
-
Unzip to git directory
-
jupyter notebook
-
You can train your own agent via
DQN.ipynb
or watch a single episode of the pre-trained network viaVisualization.ipynb
In my experience the agent achieves an average score of 13 after ~400 episodes of training:
A sample run generated from Visualization.ipynb
- Only tested on Ubuntu 18.04
- Details of the learning algorithm and chosen architecture may be found in
Report.md