Implementations of some multi-agent deep refinforcement learning algorithms for the games Combat and Snake.
PPO: Proximal Policy Optimisation. Implementation following OpenAI. https://arxiv.org/abs/1909.07528.
COMA: Counterfactual multi-agent policy gradients. https://arxiv.org/pdf/1705.08926.pdf.
Multi-agent battle environment. Implementations of PPO and COMA.
Multi-agent snake on Jidi. Implementation using PPO with self-play.