1 |
Introduction to Reinforcement Learning |
What is Reinforcement Learning? |
2 |
Introduction to ML, DL, and RL |
ML vs DL |
|
|
Convolutional Network |
|
|
Recurrent Neural Network |
|
|
Reinforcement Learning |
3 |
Mathematics for Reinforcement Learning |
Random Process |
|
|
Markov Process |
|
|
Markov Reward Process & Markov Decision Process |
|
|
Optimization |
|
|
Gradient Descent Algorithms |
|
|
Optimization Algorithms for Training Deep Neural Networks |
|
|
Information Theory |
|
|
Parameter Estimation Concept |
4 |
Reinforcement Learning Concept |
Reinforcement Learning Concept |
|
|
Reinforcement Learning Components |
|
|
Long-Term Reward and Value Function |
5 |
MDP and DP |
Markov Decision Process |
|
|
Dynamic Programming |
|
|
Policy Evaluation |
|
|
Optimal Policies Revisited |
|
|
Finding Optimal Policies: Dynamic Programming |
6 |
Model Free Algorithm |
Model-Free RL |
|
|
Monte-Carlo Method Prediction and Control |
|
|
Monte-Carlo Policy Control |
|
|
Exploration More |
|
|
Temporal Difference for Prediction |
|
|
Temporal Differences Extended: N-Step Prediction |
|
|
On-Policy Control: SARSA |
|
|
Off-Policy Learning: Q-Learning |
|
|
Comparison: SARSA and Q-Learning |
|
|
Off-Policy Learning with Importance Sampling |
7 |
Function Approximation |
Function Approximation |
|
|
Incremental Methods |
|
|
Coarse Coding |
|
|
Prediction with Value Function Approximation |
|
|
Control with Value Function Approximation |
|
|
Batch Methods |
8 |
Extension of Q-Learning |
Key Variants and Extensions of Q-Learning |
|
|
Fitted Q-Learning |
|
|
Deep Q-Network |
|
|
Double Q-Learning |
|
|
Double DQN |
|
|
Prioritized Experience Replay |
|
|
Dueling Network Architectures |
|
|
N-Step Q-Learning |
|
|
Distributional vs. Distributed Q-Learning |
|
|
Noisy Nets |
|
|
Rainbow Q-Learning: Combining Improvements in Deep Reinforcement Learning |
|
|
Asynchronous Q-Learning |
|
|
Optimistic Q-Learning |
|
|
Faster Deep Reinforcement Learning by Optimality Tightening |
|
|
Practical Skills |
9 |
Policy Based Algorithm |
Policy Gradient |
|
|
Policy Optimization |
|
|
Policy Gradient |
|
|
A Structure for Reinforce and Actor-Critic |
|
|
Reinforce |
|
|
Actor Critic |
|
|
Summary |
10 |
Model-Based Reinforcement Learning |
Model-Based Reinforcement Learning |
|
|
Model-free and model-based approach: Integrated Architecture |
|
|
Simulation for Planning |
11 |
Case Studies in Policy Based Algorithm |
Policy Gradient Theorem Revisited |
|
|
A2C |
|
|
A3C |
|
|
PPO |
|
|
DDPG |