OrestisMk / RF-Q_learning-taxi_driver--Lunanlander-Policy-gradient-

This is a project of reinforcement learning which contains two different environments. The first environment is the taxi driver problem in 4x4 space with the simple Q-learning update rule. In this task, we compared the performance of the e-greedy policy and Boltzmann policy. As a second environment, we chose the LunarLander from the open gym. For the implementation of the project, the Policy gradient has been selected.

Geek Repo

Github PK Tool

OrestisMk/RF-Q_learning-taxi_driver--Lunanlander-Policy-gradient- Watchers

Orestis Makris
OrestisMk