will-bell / procgen-adr

Implementation of Automatic Domain Randomization (ADR) and Proximal Policy Optimization (PPO) to improve generalizability of reinforcement learning agents in playing arcade games in OpenAI's Procgen environment

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

procgen-adr

Procgen ADR is a python implementation of Automatic Domain Randomization by Open AI

Team Members include: William Bell (wjbell@bu.edu), Tu Timmy Hoang (hoangt@bu.edu), David McIntyre (dpmc@bu.edu), Danny Trinh (djtrinh@bu.edu)

Installation

We created a fork of OpenAI's baselines which have useful reinforcement learning modules. We specifically use PPO and Impala CNN. In the fork we updated the repo to be compatible with TensorFlow 2.0.0. Install from source: https://github.com/tuthoang/baselines

We also forked OpenAI's procgen in order to make customizable environments. Install from source: https://github.com/will-bell/procgen

Usage

python -m baselines_adr.train --env_name dc_bossfight --n_train_envs 128 --n_training_steps 200000000 --log_dir ./recurr  --recur True

This will train a recurrent policy on our game, dc_bossfight on 128 parallal environments over 2 million training steps. Models and progress will be periodically saved in /adr_experiments/{some unique identifier}/recurr.

Files Description

Baselines ADR

  • adr_model.py - contains the model that is used to generate actions inside the environment loop
  • adr_runner.py - contains all the necessary classes and configs as well as the ParameterRunner and ADRRunner that make the ADR algorithm possible inside the training loop
  • ppo2_adr.py - training loop that runs ADR and generates data for updating policy with PPO
  • test_runner.py - runner for evaluating the model on the three environments (easy, hard, full ADR range) during training
  • train.py - command line script for running the training algorithm

Test Agent

  • test.py - contains functions to play test environmnets and return results
  • procgen_test.py - runs test environment loaded from trained model on specified environment config
  • plot_results.ipynb - simple notebook to plot and compare traning results of different models
  • models/ - model checkpoints used for evaluating performance
  • configs/ - environmnetal configurations used for evaluating performance

About

Implementation of Automatic Domain Randomization (ADR) and Proximal Policy Optimization (PPO) to improve generalizability of reinforcement learning agents in playing arcade games in OpenAI's Procgen environment


Languages

Language:Python 57.2%Language:Jupyter Notebook 42.3%Language:Shell 0.6%