Available Environments
1-Taxi-v3
2-FrozenLake-v1
3-CliffWalking-v0
you can choose from them and the defualt one is Taxi
Proplem definition :
There are 4 locations (labeled by different letters), and our job is to pick up
the passenger at one location and drop him off at another. We receive +20
points for a successful drop-off and lose 1 point for every time-step it
takes. There is also a 10 points penalty for illegal pick-up and drop-off
actions.
Introduction
In this project we will implement the Q-leaning algorithm and will see how the decay of the hyperparameter such as learning rate and discount factor and eplison will effect the results and we will implement a grid search to select the best parameters.
It is required to setup this libraries to run the project
!pip install gym
!pip install numpy
The job is to pick up the passenger at one location and drop them off in another. Here are a few things that we'd love our taxi to take care of:
- Drop off the passenger to the right location.
- Save passenger's time by taking minimum time possible to drop off
- Take care of passenger's safety and traffic rules
Trying random actions to see how the agent movements
After the agent has been trained
We can notice the difference and how the agent has been trained
We used a brute force algorithm to get the best hyper parameter
Experiments table
Best hyperparameters :
alpha=0.9 ,gamma=0.9, epsilon=0.9
Grid search evaluation
Plot the Penalties and number of epochs in each iteration after training with the best parameters