DavoodSZ1993/RL

Implementation of various RL algorithms only for mastering these concepts.

Some of these algorithms are exact copy of other sources (books, Medium, Towardsdatascience, ...)

01. Optimal Policy Search:

This code is from the Medium article titled: "Reinforcement Learning, Part 4: Optimal Policy Search with MDP" written by Dan Lee

02. Temporal-Difference Predicition:

This code is the exact implementation of the Medium article titled: "Reinforcement Learning in Python, Temporal-Difference Predicition" by James Mukuya.

03. A Python Realization of Q-Learning:

This realization is derived from the Medium Article: "Reinforcement Learning, Part 6: TD(λ) & Q-learning" by "Dan Lee"

04. Monte Carlo Predicition:

The Monte Carlo method is used for policy evaluation for OpenAI Gyms Blackjack environment.

05. Monte Carlo Control:

The Monte Carlo Control method is implemented for achieving optimal policy in OpenAI Gyms Blackjack environment.

06. SARSA - On Policy TD Control

Core Mathematical Equation:

$$ Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha[R_{t+1} + \gamma Q(S_{t+1}, A_{t+1}) - Q(S_t, A_t)] $$

07. Q-Learning: Off Policy Control

Core update equation:

$$ q(s_t, a_t) \leftarrow q(s_t, a_t) + \alpha[R_{t+1} + \gamma . argmax_{a\prime}q(s\prime_{t+1}, a\prime) - q(s_t, a_t)] $$

08. Deep Q-Network I:

I planned to have an exact implementation of the official PyTorch tutorial titled: Reinfrocement Learning (DQN) Tutorial. However I ran into issues while trying to implement the code.

09. Deep Q-Network II:

The code is based on the YouTube tutorial video titled: "Deep Q-Learning Networks"

DavoodSZ1993 / RL