Multi-Agent Deep Deterministic Policy Gradient

Solving a Physical Deception problem using Multi-Agent Deep Deterministic Policy Gradient approach in Pytorch

Agent behaviour demo

Blue dots are the "good agents", and the Red dot is an "adversary".
All of the agents' goals are to go near the green target.
The blue agents know which one is green, but the Red agent is color-blind and does not know which target is green/black!
The optimal solution is for the red agent to chase one of the blue agent, and for the blue agents to split up and go toward each of the target.

To run the code, please use the command "./run_training.sh". The bash script cleans up and DELETE previous runs. The script is necessary because we need an extra command to ensure image rendering is possible remotely. Training takes about two hour. If you run locally on your own computer. Be sure to increase the number of parallel agents to the number of cores your computer have in main.py. GPU does not help that much in the computation.
To see a visualization of the results, run the script "./run_tensorboard.sh". A link will appear, and direct your browser to that link to see rewards over time and other statistics
The trained models are stored in "model_dir" by default. You can also find .gif animations that show how the agents are performing! The gif file contains a grid of separate parallel agents.

Solving a Physical Deception problem using Multi-Agent Deep Deterministic Policy Gradient approach

MIT License

Language:Python 99.9%Language:Shell 0.1%