KerolosAtef / Atari-Games-by-Q_learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reinforcement Learning Project

Table of Contents

Overview :

Available Environments
1-Taxi-v3
2-FrozenLake-v1
3-CliffWalking-v0
you can choose from them and the defualt one is Taxi Proplem definition : There are 4 locations (labeled by different letters), and our job is to pick up the passenger at one location and drop him off at another. We receive +20 points for a successful drop-off and lose 1 point for every time-step it takes. There is also a 10 points penalty for illegal pick-up and drop-off actions. Introduction In this project we will implement the Q-leaning algorithm and will see how the decay of the hyperparameter such as learning rate and discount factor and eplison will effect the results and we will implement a grid search to select the best parameters.

Requirements :

It is required to setup this libraries to run the project

!pip install gym
!pip install numpy

Defualt environment info

Taxi-env The job is to pick up the passenger at one location and drop them off in another. Here are a few things that we'd love our taxi to take care of:

  • Drop off the passenger to the right location.
  • Save passenger's time by taking minimum time possible to drop off
  • Take care of passenger's safety and traffic rules

Training :

Random Action :

Trying random actions to see how the agent movements

Random Actions

Q-Learning

After the agent has been trained

Q-learning

We can notice the difference and how the agent has been trained

Decaying hyper parameters while training

Decay

Evaluation

Eval_100

Grid search

We used a brute force algorithm to get the best hyper parameter

Grid search values

Experiments table

parameters

Best hyperparameters :

alpha=0.9 ,gamma=0.9, epsilon=0.9

Grid search evaluation

Grid search

Plot the Penalties and number of epochs in each iteration after training with the best parameters

errors

About


Languages

Language:Jupyter Notebook 100.0%