LukasDrews97 / flappy-bird-reinforcement-learning

Training a Reinforcement Learning Agent to Play Flappy Bird.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training a Reinforcement Learning Agent to Play Flappy Bird

In this project, I trained multiple agents to play the game Flappy Bird using Advantage Actor Critic (A2C) and Proximal Policy Optimization (PPO). There are two environments to choose from, one using only player coordinates to represent state (simple environment) and one using pixel values (rgb environment). Furthermore, there are two policies to choose from, one using a multilayer perceptron (mlp policy) and one using convolution neural networks (cnn policy).

Install

Dependencies:

  • Python 3.8.3
  • packages mentioned in requirements.txt

Instructions:

  • clone flappy-bird-reinforcement-learning
  • cd into flappy-bird-reinforcement-learning
  • create and activate a custom python virtual environment
  • install packages from requirements.txt
$ python -m pip install -r requirements.txt

Project Structure

Root-Folder

File/Folder Description
train_agent.py Train or run an reinforcement agent using config file.
hyperparameter_tuning.py Run hyperparameter tuning given parameters.
config Folder containing training/run configs.
flappy_bird_gym Folder containing the flappy bird gym, originally created by [@Talendar] and slightly modified by [@doravalerie] and me.
requirements.txt Lists all packages used for the project. Designed to be used with pip.
imgs This folder contains images displayed in this file.
README.md This file.

Config files

Training and runtime parameters are stored in a yaml file. It must be stored in the config folder having the following structure:

type: [simple/rgb]
hyperparameter:
└───algorithm: [A2C/PPO]
└───policy: [MLP/CNN]
└───learning_rate: float
└───gamma: float
checkpoints:
└───prefix: str
total_timesteps: int
eval_freq: int
frame_stack: Optional[int]

Train an agent

To train an agent, run train_agent.py with the train flag and specify the config file:

$ python .\train_agent.py --train --config .\config\my_config.yml

A new folder within saved_models and logs will be created for the results.

Run a trained agent

To run a trained agent, run train_agent.py with the test flag, specify the config file and the trained model file:

$ python .\train_agent.py --test --config .\config\my_config.yml --model_path .\saved_models\my_model\best_models\best_model.zip

Hyperparameter Tuning

To start hyperparameter tuning, run hyperparameter_tuning.py with the follwing arguments:

$ python .\hyperparameter_tuning.py --algorithm a2c --timesteps 5000000 --env rgb --policy cnn --trials 20
Parameter Description Possible Values Required
--algorithm Algorithm to use to train the model. a2c, ppo Yes
--timesteps Maximal number of timesteps to train a model while tuning. int Yes
--env Which environment to use. simple, rgb Yes
--policy Which policy to use. mlp, cnn Yes
--trials The number of different tuning trials. int Yes

A new folder within saved_models_hyperparameter and logs_hyperparameter will be created for the results.

Results

The performance of a model is measured in the number of survived steps (frames).

Custom Configs

Environment Config Lr Gamma Best Result Training Time
Simple PPO MLP 1e-5 0.95 101 4h
RGB PPO MLP 1e-5 0.95 102 17h
RGB PPO CNN 1e-5 0.95 450 15h
Simple A2C MLP 7e-4 0.99 1800 1h
RGB A2C MLP 7e-4 0.99 101 30h
RGB A2C MLP 7e-5 0.95 101 30h
RGB A2C CNN 7e-4 0.99 101 30h
RGB A2C CNN 7e-5 0.95 318 30h

Hyperparameter Tuning

Environment Config Lr Gamma n_steps Best Result
Simple PPO MLP 1e-3 0.99 4096 1840
RGB PPO MLP 1e-3 0.90 256 101
RGB PPO CNN 1e-3 0.98 2048 103
Simple A2C MLP 1e-3 0.98 1024 113
RGB A2C MLP 1e-5 0.98 512 101
RGB A2C CNN 1e-5 0.95 8192 101

About

Training a Reinforcement Learning Agent to Play Flappy Bird.

License:MIT License


Languages

Language:Python 99.5%Language:Makefile 0.5%