epsilon-greedy gradient-based-bandit greedy-approach jupyter-notebook python3 reinforcement-learning statistical-inference ucb-algorithm

intro-rl

This repository contains Reinforcement Learning course projects...

The goal of HW1 is getting familiar with python and the concepts of Statistical Inference by comparing three different policies.

In HW2, the average received reward by epsilon-greedy, gradient-based and UCB agents with different reward distribution are compared. The impact of tunning hyperparameters like epsilon (in epsilon-greedy algorithm) and learning rate (in UCB) on the average received reward has been observed as well.

Policy Iteration and Value Iteration policies (with respect to their convergence rate) are compared and the impact of threshold and discount factor on their convergence has been observed in HW3.

In HW4, the difference in convergence rate and steady state value of Q-Learning (with constant and decaying learning rate), SARSA, Tree Back up n-step and off-policy Monte Carlo with epsilon greedy behavior policy (in both constant and decaying epsilon) algorithms have been observed.

The main focus of HW5 was on Deep Reinforcement Learning algorithms and for this purpose, DQN algorithm with image observation by using CNN has been implemented and the impact of Transfer Learning in comparison to Ordinary Learning has been cleared.

About

This repository contains Reinforcement Learning course projects...

epsilon-greedy gradient-based-bandit greedy-approach jupyter-notebook python3 reinforcement-learning statistical-inference ucb-algorithm

MIT License

Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%