-
Overview.
-
Reinforcement Learning [slides] [lecture note] [Video (in Chinese)].
-
Value-Based Learning [slides] [Video (in Chinese)].
-
Policy-Based Learning [slides] [Video (in Chinese)].
-
Actor-Critic Methods [slides] [Video (in Chinese)].
-
AlphaGo [slides] [Video (in Chinese)].
-
-
TD Learning.
-
Sarsa [slides] [Video (in Chinese)].
-
Q-learning [slides] [Video (in Chinese)].
-
Multi-Step TD Target [slides] [Video (in Chinese)].
-
-
Advanced Topics on Value-Based Learning.
-
Experience Replay (ER) & Prioritized ER [slides] [Video (in Chinese)].
-
Overestimation, Target Network, & Double DQN [slides] [Video (in Chinese)].
-
Dueling Networks [slides] [Video (in Chinese)].
-
-
Policy Gradient with Baseline.
-
Policy Gradient with Baseline [slides] [Video (in Chinese)].
-
REINFORCE with Baseline [slides] [Video (in Chinese)].
-
Advantage Actor-Critic (A2C) [slides] [Video (in Chinese)].
-
REINFORCE versus A2C [slides] [Video (in Chinese)].
-
-
Advanced Topics on Policy-Based Learning.
-
Trust-Region Policy Optimization (TRPO) [slides] [Video (in Chinese)].
-
Partial Observation and RNNs.
-
-
Dealing with Continuous Action Space.
-
Discrete versus Continuous Control [slides] [Video (in Chinese)].
-
Deterministic Policy Gradient (DPG) for Continuous Control [slides] [Video (in Chinese)].
-
Stochastic Policy Gradient for Continuous Control [slides] [Video (in Chinese)].
-
-
Multi-Agent Reinforcement Learning.
-
Basics and Challenges [slides] [Video (in Chinese)].
-
Centralized VS Decentralized [slides] [Video (in Chinese)].
-
-
Imitation Learning.
-
Inverse Reinforcement Learning.
-
Generative Adversarial Imitation Learning (GAIL).
-