control-systems ddpg keras maddpg multi-agent-reinforcement-learning multi-agent-systems optimal-path pursuit-evasion reinforcement-learning reinforcement-learning-agent reinforcement-learning-algorithms tensorflow

MADDPG KERAS Implementation

Implementation Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in keras with very simple customization. Link to the paper https://arxiv.org/pdf/1706.02275.pdf

Previous version of code is available in v0.1 branch

Project Description

This is implementation of maddpg algorithm in tensorflow keras and is easy to understand
maddpg implementation of openai is in tensorflow v1, hence making it difficult to understand for those who is accustomed to tensorflow v2 and keras
This is implementation is built-up on DDPG implementation on Keras Website, have a look at ddpg implementation as well
This repository is a good starting point for those looking to customize maddpg implementation

Features

This implementation has been succesfully tested for competetive environment of 2 pursuer and 1 evader problem
This implementation works for any (n) number of agents, which can be decided by user
To work with this implementation, user only needs to create a new env.py file, defining the environment
Here reward curve generated by this implementation 2 pursuer-1 evader envader after training for 3000 episodes
Also check the small animation generated by trained model (using this implementation) for 2 pursuer-1 evader envader environment

maddpg-keras.mp4

Impementation is very well documentated, and given easy implementation it is easy to understand
Author of code can be contacted directly on email(prshukla.edu@gmail.com) or linkedin in case of issue
Please note that GPU implementation is currently not supported, please contact author of the code if enhancement to add GPU support for training is needed
It takes around 20 hours to train 3 agents in 2 pursuer-1 evader environment for 3000 episodes (100 steps in each episode) on single i5-113G7 processor

Installation

For successful installation, use the given commands in terminal

git clone https://github.com/pr-shukla/maddpg-keras.git
cd maddpg-keras
pip install -r requirements.txt

Usage

To train on the same 2 pursuer 1 evader competetive environment run the following command in root folder

python3 train.py

You can create custom environment in env.py and then repeat step 1. env.py file should have following class and method

class Environment:
    def __init__(self):
        pass
    def initial_obs():
        '''
        Define initial observation state of your environment
        '''
    def step(self, action):
        '''
        Execute step and calculate new observation state
        '''

def reward(state):
    '''
    Calculate reward given new state
    '''

You may want to change values of parameter like STD_DEV, GAMMA, TAU in config.py for custom environment
To quickly see the result of previous training that author did, you can run predict.py as (trained models are saved in saved_models folder)

python3 predict.py

Code Structure

Code contains three directory: maddpg (contains code for maddpg implementation), env (contains training and prediction environment code), saved_models (contained pretrained models)
train.py: Main trianing code
config.py: Define training parameters like NUM_EPISODES, NUM_STEPS
predict.py: Code for testing trained model on prediction environment
\maddpg\buffer.py: 1. Calculates gradient and updates critic and actor models 2. Maintains buffer of experience
\maddpg\model.py: Creates neural network model for actor and critic model
\maddpg\noise.py: Creates random noise which added to predicted action for more exploration
\env\env.py: Training environment is defined here
\env\env_predict.py: Prediction/Testing environment is defined here
Please refer to algorithm while going through code
Gradient calculation steps are extensively documented in Buffer.learn() method in buffer.py.

Possible Enhancements

Updates as of Dec 10, 2023

Training on GPU is not supported, contribution is welcomed to make this enhancement
Implementation has been tested for tensorflow version 2.3 and 2.8, recent versions may not work.
Currently time complexity of training is O(batch size), please look at implmentation in buffer.py for more details
Implementation works for agents performing single dimensional actions only, not for multi dimensional action
Actions are unnessarily calculated in Buffer.learn() methods in buffer.py. Search @bug in buffer.py for more details

How to Contribute

To make contribution simply create issue and raise pull request.

Support

For any support related to implementation, you can either raise an issue or could direectly shoot email at prshukla.edu@gmail.com

License

Licensed under MIT license.

About

Implementation Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in keras