DavoodSZ1993 / RL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implementation of various RL algorithms only for mastering these concepts.

Some of these algorithms are exact copy of other sources (books, Medium, Towardsdatascience, ...)

01. Optimal Policy Search:

This code is from the Medium article titled: "Reinforcement Learning, Part 4: Optimal Policy Search with MDP" written by Dan Lee

02. Temporal-Difference Predicition:

This code is the exact implementation of the Medium article titled: "Reinforcement Learning in Python, Temporal-Difference Predicition" by James Mukuya.

03. A Python Realization of Q-Learning:

This realization is derived from the Medium Article: "Reinforcement Learning, Part 6: TD(λ) & Q-learning" by "Dan Lee"

04. Monte Carlo Predicition:

The Monte Carlo method is used for policy evaluation for OpenAI Gyms Blackjack environment.

05. Monte Carlo Control:

The Monte Carlo Control method is implemented for achieving optimal policy in OpenAI Gyms Blackjack environment.

06. SARSA - On Policy TD Control

Core Mathematical Equation:

$$ Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha[R_{t+1} + \gamma Q(S_{t+1}, A_{t+1}) - Q(S_t, A_t)] $$

07. Q-Learning: Off Policy Control

Core update equation:

$$ q(s_t, a_t) \leftarrow q(s_t, a_t) + \alpha[R_{t+1} + \gamma . argmax_{a\prime}q(s\prime_{t+1}, a\prime) - q(s_t, a_t)] $$

08. Deep Q-Network I:

I planned to have an exact implementation of the official PyTorch tutorial titled: Reinfrocement Learning (DQN) Tutorial. However I ran into issues while trying to implement the code.

09. Deep Q-Network II:

The code is based on the YouTube tutorial video titled: "Deep Q-Learning Networks"

About


Languages

Language:Jupyter Notebook 100.0%