KevinMathewT / RL-RoboSoccer-FirstPlace

First Place Reinforcement Learning solution code and a writeup for the AI RoboSoccer Competition.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AI RoboSoccer First Place Solution

This repository contains the code and a writeup for the first place solution of the AI RoboSoccer Competition, which was the the flagship event conducted by IEEE Student Chapter, Birla Institute of Technology & Science (BITS), Pilani.

The event works on a custom built environment for simulation, made using OpenAI Gym. They had developed an environment on Gym using Pygame with its assets including the defenders and the attackers. Furthermore, the feedback of every action and the state-space log would be available via pre-defined functions.

Find more about the competition here, and find the repo for the OpenAI Gym environment here.

random_actions

Solution Outline

Observation Space

The original observation space from the gym environment provides the x and y coordinates of each player from both teams and the ball, along with the velocities of each player.

For my solution I modified the observation space to be the manhattan distance and the direction vector of the ball from each player and both the goal posts, and dropped the velocities completely. This modification was crucial for the winning solution.

Reinforcement Learning Agent

For the RL Agent, I used an Advantage Actor Critic (A2C) network, implemented in StableBaselines3, which contains reliable implementations of reinforcement learning algorithms in PyTorch.

ActorCriticPolicy(
  (features_extractor): FlattenExtractor(
    (flatten): Flatten(start_dim=1, end_dim=-1)
  )
  (mlp_extractor): MlpExtractor(
    (shared_net): Sequential()
    (policy_net): Sequential(
      (0): Linear(in_features=18, out_features=64, bias=True)
      (1): Tanh()
      (2): Linear(in_features=64, out_features=64, bias=True)
      (3): Tanh()
    )
    (value_net): Sequential(
      (0): Linear(in_features=18, out_features=64, bias=True)
      (1): Tanh()
    )
  )
  (action_net): Linear(in_features=64, out_features=20, bias=True)
  (value_net): Linear(in_features=64, out_features=1, bias=True)
)

For training I used the RMSpropTFLike optimizer, as it stabilized training as given here.

policy_kwargs["optimizer_class"] = RMSpropTFLike
policy_kwargs["optimizer_kwargs"] = dict(
    alpha=0.99, eps=1e-5, weight_decay=0)

Prerequisites

Use pip to install requirements for the repository:

$ pip install -r requirements.txt

Training

For training the model execute:

$ python -m src.2v2.train

Similarly you can execute python -m src.5v5.train and python -m src.10v10.train to train the agent for 5v5 and 10v10 environments.

Models are saved in ./generated/

Experiments

Average Scores over 30 Episodes for my experiments in the 2v2 environment:

Model Observation Space MLP Extractor Policy Network MLP Extractor Value Network Training Timsteps Score
A2C Distance + Direction Vector of left team from ball 2 Layered (64, 64) 2 Layered (64, 64) 10000 2617.0491
A2C Distance + Direction Vector of left team from ball 2 Layered (64, 64) 2 Layered (64, 64) 15000 3033.4046
A2C Distance + Direction Vector of left team from ball 2 Layered (64, 64) 2 Layered (64, 64) 20000 2148.8943
A2C Distance + Direction Vector of both teams from ball 2 Layered (64, 64) 2 Layered (64, 64) 15000 2148.8943
A2C Distance + Direction Vector of both teams from ball + Ball from both goals 2 Layered (64, 64) 2 Layered (64, 64) 15000 3496.6428
A2C Distance + Direction Vector of both teams from ball + Ball from both goals 2 Layered (64, 64) 2 Layered (64, 64) 20000 5306.7699
A2C Distance + Direction Vector of both teams from ball + Ball from both goals 2 Layered (64, 64) 1 Layered (64) 15000 5471.0487
A2C Distance + Direction Vector of both teams from ball + Ball from both goals 2 Layered (64, 64) 1 Layered (64) 10000 5700.8237

Rendering

You can also render and watch a game played by the trained agent by executing:

$ python -m src.2v2.render_game

Similarly you can execute python -m src.5v5.render_game and python -m src.10v10.render_game to render a game played in the 5v5 and 10v10 environments.

About

First Place Reinforcement Learning solution code and a writeup for the AI RoboSoccer Competition.


Languages

Language:Python 100.0%