scimone / PommermanChallenge

Training a Deep Reinforcement Learning agent to participate at the Pommerman Challenge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pommerman Challenge

This notebook focuses on training a Deep Reinforcement Learning (DRL) agent to excel in the Pommerman Challenge. The Pommerman Tournament involves one-on-one battles on a 6x6 grid, where the agent must navigate the environment, compete against opponents, and make strategic decisions to achieve victory.

Key Facts

  • Pommerman Tournament: The challenge revolves around one-on-one battles on a 6x6 grid.
  • Varying Starting Positions: Matches have varying starting positions: 0 represents the top left, 1 the bottom right.
  • Sparse and Delayed Rewards: The agent is required to plan ahead and strategize.
  • Noisy Reward Signal: Potential opponent suicides cause noise to the reward signal.

Approach

The chosen approach involves employing a Deep Q-Network (DQN). Key components of the approach:

  • Deep Q-Network: DQN with separate training and target value functions to improve stability and convergence.
  • Replay Buffer: To ensure that the training data remains independent and identically distributed (i.i.d.), a replay buffer is employed.
  • Agent Network: The best-performing agent network is used as the basis for both the training and target value functions.
  • Opponents: The training involves pitting the agent against various opponents. These include four other trained agents, a simple agent, and a random agent.

Training

The training process consists of the following steps:

  • The agent is trained across 2000 episodes against each of the six opponents.
  • The total number of training episodes sums up to 6 * 2000 = 12000 episodes.
  • The initial starting position for the agent is 1, which is opposite to the position it was trained on. Subsequent training sessions involve alternating starting positions.

About

Training a Deep Reinforcement Learning agent to participate at the Pommerman Challenge

License:MIT License


Languages

Language:Jupyter Notebook 100.0%