Important Note: the codes for Acme and Stable Baselines3 are adapted from the original repositories (https://github.com/deepmind/acme) and (https://github.com/DLR-RM/stable-baselines3) respectively.
CS523 Team6: Deep Reinforcement Learning on Atari Game Environments
Team Members
- Hayato Nakamura (nhayato@bu.edu)
- Ngozi Omatu (nomatu@bu.edu)
Overview
Most of the classical RL algorithms heavily rely on the hand-crafted features and linear value/policy functions. In other words, the quality of the feature representation have a huge impacts on the performance of such RL algorithms. Deep Reinforcement Learning allows for agent to make decisions (Trial and Error) without manual engineering on the state feature information. Therefore, Deep RL can surpass human performance with end-to-end training, and very ideal for dynamic environments.
Our goal of this project is to reproduce the results (average rewards per episodes) of the Deep Q-learning model ("Playing Atari with Deep Reinforcement Learning") on Atari games by Deepmind using two deep reinforcement learning libraries: (1) DM-Acme and (2) Stable-Baselines3. We focused on two specific Atari games: Pong and Breakout.
Demo
Pong (DM-Acme Version; 10,000 episodes)
Breakout (DM-Acme Version; 15,000 episodes)
Pong (Stable Baselines3 Version; 500,000 timesteps)
Breakout (Stable Baseline3 Version: 11,000,000 timesteps)
Presentation Slide
Link to out Presentation Slide Note: To access the presentation document, you need to login to Google Account using Boston University Credential.
How to Run the Code
DM-Acme
$ git clone https://github.com/raltech/CS523_Project.git
$ cd ./CS523_Project
$ pip install -r ./Acme/acme_requirements.txt
To train the DQN agent on Pong environment:
- Open
./Acme/dm_acme_pong.ipynb
- Executes cells from the top.
To train the DQN agent on Breakout environment:
- Open
./Acme/dm_acme_breakout.ipynb
- Executes cells from the top.
Checkpoints for the trained agents:
- The checkpoints for the trained Pong agent (10,000 episodes) can be downloaded from here: (https://drive.google.com/file/d/1KeaD8ZdHCbv3-qOr7rK4_8hpXY8anNXR/view?usp=sharing).
- The checkpoints for the trained Breakout agent (55,000 episodes) can be downloaded from here: (https://drive.google.com/file/d/1z25WiW0TUCtjTLu35C-hKdJNAiXxSzqS/view?usp=sharing). You need to login with Boston University Credential to access these files.
Stable-Baselines3
$ git clone https://github.com/raltech/CS523_project.git
$ cd ./CS523_project
To train the DQN agent on Pong enviroment:
- Open
./Stable_baselines/Pong_rl_baselines_zoo.ipynb
- Execute cells from the top. Note: To run the trained model, the zip file for PongNoFramework should be added to the logs/dnq folder. This file is created after copying the gitfor Stable baseline3 zoo.
The file of the pretrained model for pong is avaliable in the same branch as the code. To train the DQN agent on Breakout enviroment:
- Open
./Stable_baselines/Breakout_rl_baselines_zoo.ipynb
- Execute cells from the top.
Results
Pong | Breakout | |
---|---|---|
Random | -20.4 | 1.2 |
Sarsa | -19 | 5.2 |
Contingency | -17 | 6 |
Human | -3 | 31 |
DQN (Original) | 20 | 168 |
DQN (Ours; DM-Acme) | 21 | In Progress (102) |
DQN (Ours; Stable-Baselines3) | 20.4 | In Progress (0) |
- Numbers represent the scores obtained in each game by each agent.
- The results for the first five rows (Random, Sarsa, Contingency, Human, and DQN (Original)) are obtained from the original paper (Playing Atari with Deep Reinforcement Learning).
- Our models (both DM-Acme and Stable-Baselines3) surpassesd the original DQN result reported in the paper.
- Breakout takes a significantly more time to train than Pong. Although we have not achived the original score (168), we see a steady increase in our agents' performance. With more training time, we are sure that we can get a similar or better result on Breakout as well.
References
Playing Atari with Deep Reinforcement Learning (https://arxiv.org/abs/1312.5602) Mnih, Volodymyr & Kavukcuoglu, Koray & Silver, David & Graves, Alex & Antonoglou, Ioannis & Wierstra, Daan & Riedmiller, Martin. (2013). Playing Atari with Deep Reinforcement Learning.
Pong image used in our presentation slide https://minpy.readthedocs.io/en/latest/tutorial/rl_policy_gradient_tutorial/rl_policy_gradient.html
Breakout image used in our presentation slide https://noteoneverything.blogspot.com/2018/02/reinforcement-learning-of-atari-breakout.html
DQN Architecture image used in our presentation slide https://leonardoaraujosantos.gitbook.io/artificial-inteligence/artificial_intelligence/reinforcement_learning/deep_q_learning