gag-j / PACMAN-using-DRL

This is a project mentored by me and Utkarsh Agarwal for WNCC's Seasons of Code 2021

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PACMAN-using-DRL

This is a project mentored by Gagan Jain and Utkarsh Agarwal for WNCC's Seasons of Code 2021

Mentees

  • Utkarsh Ranjan - 200050147
  • Alakh Agrawal - 200040018
  • Nikhil Kaniyeri - 200070050
  • Akshat Gautam - 190110004

Motivation

Reinforcement Learning (RL) is a field of Artificial Intelligence where an agent learns by interacting with its environment and receiving a reward/penalty for its actions. RL has recently started receiving a lot more attention, owing to the famous victory by an RL agent over the world champion in the game of “Go”. This repo containsthe project aimed to implement RL algorithm on OpenAI's PACMAN and get us familiar with the field.

How we did this project

We first read the 3 chapters of the book Sutton and Barto and learnt python. This was done before the endsem. After endsem we started the implementation, first we wrote a code for gridworld (using this site as our metric) Once that was done we installed the gym environment. Read about the Deep RL architecture and implemented it on cartpole-v0. After that we added a replay buffer and dual network. Once this was done we moved towards advanced algorithms read these papers found some implementations online, did some hyperparameter search and started training the agent on colaboratory.

What we learned

  • You can make impossible things possible if you learn from tons of your mistake and correct them on your way. #Reinforcement Learning
  • Implementation of standard python libraries like numpy, matplotlib,
  • Fundamentals of RL algorithms, MDP,TD, Monte Carlo, Q learning, Sarsa Learning
  • Usage of ML frameworks for deepLearning like tensorflow, keras
  • Familiarising with open ai’s gym a toolkit for developing and comparing RL algorithms
  • Advanced and recent developments in RL like NoisyNets , RainbowNets
  • Fundamentals involved while training an RL agent like tuning hyperparameter, loading and saving a deep learning model

Contains

  • A folder of k-bandit assignment (assignment)
  • A folder of grid world assignment
  • A folder of gym-code (atari-games)
  • A folder of research-papers
  • Code with implementation of research paper
  • Code used for training (an intermediate model to final code)
  • Other codes are intermediate networks written during development

How to use

  • This is the final code written by Utkarsh Ranjan. It needs to be run on colaboratory for its successful execution.
  • It is necessary to mount gdrive and make folders buffers,models,cum_rewards and plots at path '/content/drive/MyDrive/pacman_SOC_outputs/'
  • In[1] installs all the dependencies required to run the gym environment in colab which includes ROM for atari-game and Ipython display dependencies.
  • Further this code is well commented the network used is NoisyNet_Dueling though there are other networks made while development
  • In[2], In[3] are for Ipython display
  • All hyperparameters are in In[94]
  • This code is written by Nikhil Kaniyeri.
  • It runs directly on Visual Studio Code. Requires installation of gym and gym[atari] locally.
  • You also need to download the ROM for Pacman and import it into the library. This is a one time process and is detailed here.

Results

Mp4 initially Ep 1

Mp4 after training for Ep 270 It can be properly observed how the agent learned to avoid the ghost after she died in her first attempt. Further the agent learned to eat the fruit to repel the ghosts and earn points by eating them.

Mp4 after training for Ep 600 By now agent have started avoiding ghost (clear in this video) ,at the same time she has learned not to avoid them when they are blue.

This contains all the outputs- models, plots and videos.Plots are not continuous due to non-continuous training on colab.

Appendix

About

This is a project mentored by me and Utkarsh Agarwal for WNCC's Seasons of Code 2021


Languages

Language:Jupyter Notebook 99.2%Language:Python 0.8%