reinforcement-learning deep-learning deep-reinforcement-learning rl machine-learning artificial-intelligence gym tensorflow python pytorch dqn ddqn a2c a3c vpg ppo trpo acktr sac

Exhaustive Reinforcement Learning

This repository aims to exhaustively implement various Deep Reinforcement Learning concepts covering most of the well-known resources from textbooks to lectures. For each notion, concise notes are provided to explain, and associated algorithms are implemented in addition to their environments and peripheral modules. At the end of this readme file, Reinforcement Learning's key papers and worthwhile resources are cited.

Motivations

Pseudocode and Algorithms

Textbooks Taxonomy

Tabular Methods
- Bandit Problem
- Dynamic Programming
- Monte Carlo Methods
- Temporal-Difference Learning
- n-step Bootstrapping
- Planning and Learning
Approximate Solution Methods
- On-policy Prediction With Approxiamtion
  - Gradient Monte Carlo
  - Semi-Gradient TD(0)
- On-policy Control With Approxiamtion
  - Semi-Grdient SARSA
  - Semi-Gradient n-step SARSA
- Off-policy Control With Approxiamtion
- Eligibility Traces
- Policy Gradient Methods
  - REINFORCE
  - one-step Actor-Critic
Deep Reinforcement Learning Methods
- Value-Based Methods
  - Neural Fitted Q-function (NFQ)
  - DQN
  - DDQN
  - Dueling DDQN
  - PER
  - C51
  - QR-DQN
  - HER
- Policy-Based Methods
  - REINFORCE
  - VPG
  - PPO
  - TRPO
- Stochastic Actor-Critic Methods
  - A2C
  - A3C
  - GAE
  - ACKTR
- Deterministic Actor-Critic Methods
  - Deep Deterministic Policy Gradient (DDPG)
  - TD3
  - SAC

Environments

Black Jack
- Monte Carlo Prediction
- Monte Carlo Exploring Starts
CartPole
- Fully Connected Q-function
- DQN
- DDQN
- Dueling DQN
Cliff Walking
- SARSA
- Q-Learning
- Expected SARSA
Gambler's Problem
- Value Iteration
Grid World
- Iterative Policy Evaluation
Jack's Car Rental
- Policy Iteration
Lunar Lander
- REINFORCE using Non-linear Approximation
- VPG
Small MDP (Maximization Bias)
- Q-Learning
- Double Q-Learning
Mountain Climbing
- Semi-Gradient SARSA
- Semi-Gradient n-step SARSA
Multi-Armed Bandit
- Simple Bandit
- Gradient Bandit
Pendulum Swing-Up
- Actor-Critic using Tile-coding
- Actor-Critic Countinous Action Space
Random Walk
- n-step TD Prediction
- Gradient Monte Carlo State Aggregation
- Gradient Monte Carlo Tile Coding
- Semi-Gradient TD(0) State Aggregation
Short Corridor Gridworld
- REINFORCE (Policy Gradient) using Linear Approximation
- REINFORCE with Baseline
Windy Grid World
- SARSA

Relevant Resourses

Textbooks

Courses

Artificial Inteligence
- UC Berkeley CS188: Introduction to Artificial Intelligence
Reinforcement Learning
Deep Reinforcement Learning

Useful Blogs

Articles

Better Exploration with Parameter Noise

Key Papers

Actor-Critic -
REINFORCE - Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992 83 8, 229–256 (1992).

Deep Reinforcement Learning

Contribution

When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the me before making a change.

About

Exhaustive Implementation of Algorithms, Key Papers, and Well-Known Problems of Reinforcement Leaning