Random agents | Trained agents |
---|---|
Examples of random agents and agents trained to hit the ball over the net.
Note: The code in this repo is based on my original implementation of DDPG, which can be found here.
This project implements Multi-Agent Deep Deterministic Policy Gradient (MADDPG) [1] to solve the "Tennis" Unity ML Environment. The goal in this environment is to train two agents to hit a ball over a net with a racquet. Each individual agent is rewarded for hitting the ball over the net, and penalized when they hit it out of bounds or allow it to drop; therefore, both collaboration and competition is required to solve the environment. This project was completed as part of the (Unity-sponsored) Udacity course on Deep Reinforcement Learning.
24 real-valued variables describing the position and velocity of the ball and raquet. Each agent has its own local observation at each timestep.
A float vector of size 2. One entry controls horizontal movement, while the other controls vertical movement. (Every entry is bound between [-1, +1].)
Each agent receives a reward of +0.1 for hitting the ball over the net, and -0.01 if the ball hits the ground on the agent's side or if the agent hits the ball out-of-bounds.
The environment is considered solved when one of the agents can receive an average reward of +0.5 or more, over a window of 100 episodes. (Note: the reward per agent is summed over each episode, and the max of this value is taken as the episode score.)
Based on the Udacity setup (see here), Conda/virtualenv can be used to install the required dependencies. For example:
virtualenv --python=python3 venv
source venv/bin/activate
pip install -r requirements.txt
The environment executable can be downloaded for different platforms.
A document detailing some of the implementation details and ideas for future work.
Contains a CLI used for training and visualizing the model.
The main module.
Contains the Agent class, used to organize the Policy/Q-Net of each agent.
Contains definitions of the Policy/Q-Net (Actor/Critic) models.
Contains functions for training and visualizing agents.
Various utilities for managing the environment and training loop.
Contains pretrained models.
Pre-trained Policy Nets (Actors) for each agent.
Pre-trained Q-Nets (Critics) for each agent.
The main.py
script can be used to train agents and visualize them.
To train:
python3 main.py train
To visualize:
python3 main.py visualize models/