Some of these algorithms are exact copy of other sources (books, Medium, Towardsdatascience, ...)
This code is from the Medium article titled: "Reinforcement Learning, Part 4: Optimal Policy Search with MDP" written by Dan Lee
This code is the exact implementation of the Medium article titled: "Reinforcement Learning in Python, Temporal-Difference Predicition" by James Mukuya.
This realization is derived from the Medium Article: "Reinforcement Learning, Part 6: TD(λ) & Q-learning" by "Dan Lee"
The Monte Carlo method is used for policy evaluation for OpenAI Gyms Blackjack environment.
The Monte Carlo Control method is implemented for achieving optimal policy in OpenAI Gyms Blackjack environment.
Core Mathematical Equation:
Core update equation:
I planned to have an exact implementation of the official PyTorch tutorial titled: Reinfrocement Learning (DQN) Tutorial. However I ran into issues while trying to implement the code.
The code is based on the YouTube tutorial video titled: "Deep Q-Learning Networks"