alphago alphago-zero alphagozero reinforcement-learning pytorch ddpg ddpg-algorithm ddpg-agent reinforcement-learning-algorithms reinforcement-learning-agent deep-reinforcement-learning

Competitive Tennis Agents Training with Deep Reinforcement Learning DDPG

Introduction

This repository contains a Deep Deterministic Policy Gradients (DDPG) agent running in the Unity ML Agent Tennis (https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md#tennis) environment. It can be used to train and evaluate the result of the training.

It is an extension of my previous project Learning Continuous Control in Deep Reinforcement Learning to try out the "AlphaGo Zero" like competitive agents training in which 2 agents knows nothing about the rules at the beginning and gradually gains experience and rewards by playing with each other.

The DDPG agent is implemented in Python 3 using PyTorch.

The provided model weights is trained in AWS EC2 p2.xlarge GPU instance in about 1 hr 15 mins.

Environment

The 3D environment contains 2 tennis agents who can move forward, backward or jump.

Goal

The goal is to bounce the ball across the net to other side while not dropping it and the ball is still within the bounds.

Environment Solved Criteria

The environment is considered solved when the average max score of either agent reach 0.5+ in the last 100 epsisodes.

Rewards

A reward of +0.1 is provided for each step that the agent bounces the ball to other side successfully.
A reward of -0.1 is provided for each step that the agent let the ball drop or out of the bounds.

Actions

Vector Action space: (Continuous) Size of 2, corresponding to moving forward, backward or jump.

Spaces

The observation space is composed of 8 variables:
position, velocity of ball and velocity

Getting Started

Clone this Git repository https://github.com/kinwo/deeprl-tennis-competition
Install Unity ML https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md
Download the Unity ML environment from one of the links below based on your OS:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here

Then unzip the file and place the file in the project folder.

Create Conda Environment

Install conda from https://conda.io. Create a new Conda environment with Python 3.6.

conda create --name deeprl python=3.6
source activate deeprl

Install Dependencies

cd python
pip install .

How to run the agent

Start Jupyter Notebook

jupyter notebook

To start training, simply open Tennis.ipynb in Jupyter Notebook and follow the instructions there:

Trained model Weights

Trained model weights is included for quickly running the agent and seeing the result in Unity ML Agent. Simply skip the training step and run the last step of the Tennis.ipynb

About

Learning to play tennis from scratch with AlphaGo Zero style self-play using DDPG

alphago alphago-zero alphagozero reinforcement-learning pytorch ddpg ddpg-algorithm ddpg-agent reinforcement-learning-algorithms reinforcement-learning-agent deep-reinforcement-learning

Languages

Language:HTML 52.4%Language:Python 32.1%Language:Jupyter Notebook 15.6%