Deep Reinforcement Learning Workbook

Author: Robert Tjarko Lange | 2019

In this repository I document my self-study of Deep Reinforcement Learning. More specifically, I collect reading notes as well as reproduction attempts. The chronology of this repository is based on the amazing "Spinning Up in DRL" tutorial by OpenAI which in my mind is the best resource on SOTA DRL as of today.

Here are all papers and the corresponding notes that I got to read so far:

1. Deep Q-Learning

Read / Notes	Title & Author	Year	Category	Algorithm	Paper	Notes
13/05/19 #1 - 🔥	Playing Atari with Deep Reinforcement Learning, Mnih et al.	2013	Deep Q-Learning	DQN	Click	Click
01/12/18 #2 - 🔥	Deep Recurrent Q-Learning for Partially Observable MDPs, Hausknecht and Stone	2015	Deep Q-Learning	DRQN	Click	Click
16/05/19 #3 - 🔥	Deep Reinforcement Learning with Double Q-learning, van Hasselt et al.	2015	Deep Q-Learning	DDQN	Click	Click
17/05/19 #4 - 🔥	Prioritized Experience Replay, Schaul et al.	2016	Deep Q-Learning	PER	Click	Click
15/05/19 #5 - 🔥	Dueling Network Architectures for Deep Reinforcement Learning, Wang et al.	2016	Deep Q-Learning	Dueling DQN	Click	Click
17/05/19 #6 - 🔥	Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al.	2017	Deep Q-Learning	Rainbow	Click	Click
24/05/19 #7 - 🔥	Noisy Networks for Exploration, Fortunato et al.	2018	Deep Q-Learning	Noisy Nets	Click	Click
25/05/19 #8 - 🔥	A General Reinforcement Learning Algorithm that Masters Chess, Shogi and Go through Self-Play, Silver et al.	2019	Deep Q-Learning	AlphaZero	Click	Click
25/12/19 - 🔥	Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, Schrittwieser et al.	2019	Deep Q-Learning	MuZero	Click	Click
29/07/19 #20 - 🌔	A Distributional Perspective on Reinforcement Learning, Bellemare et al.	2017	Distributional RL	C51	Click	Click
31/07/19 #21 - 🌔	Distributional Reinforcement Learning with Quantile Regression, Dabney et al.	2017	Distributional RL	QR-DQN	Click	Click
01/08/19 #22 - 🌔	Implicit Quantile Networks for Distributional Reinforcement Learning, Dabney et al.	2018	Distributional RL	IQN	Click	Click
02/08/19 #23 - 🌔	Deep Reinforcement Learning and the Deadly Triad, van Hasselt et al.	2018	Deep Q-Learning	-	Click	Click
09/08/19 #24 - 🌔	Towards Characterizing Divergence in Deep Q Learning, Achiam et al.	2019	Deep Q-Learning	PreQN	Click	Click
10/08/19 #25 - 🌔	Non-Delusional Q-Learning and Value Iteration, Lu et al.	2019	Deep Q-Learning	PCVI/PCQL	Click	Click
15/08/19 #26 - 🌔	Ray Interference: A source of plateaus in DRL, Schaul et al.	2019	Deep Q-Learning	-	Click	Click

2. Policy Gradient Methods

Read / Notes	Title & Author	Year	Category	Algorithm	Paper	Notes
25/05/019 #9 - 🔑	Asynchronous Methods for Deep Reinforcement Learning, Mnih et al.	2016	Policy Gradients	A3C	Click	Click
29/05/019 #10 - 🔑	Trust Region Policy Optimization, Schulman et al.	2015	Policy Gradients	TRPO	Click	Click
11/06/019 #11 - 🔑	High-Dimensional Continuous Control Using Generalized Advantage Estimation, Schulman et al.	2016	Policy Gradients	GAE	Click	Click
18/06/019 #12 - 🔑	Proximal Policy Optimization Algorithms, Schulman et al.	2017	Policy Gradients	PPO	Click	Click
20/06/19 #13 - 🔑	Emergence of Locomotion Behaviours in Rich Environments, Heess et al.	2017	Policy Gradients	-	Click	Click
20/06/19 #14 - 🔑	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, Wu et al.	2017	Policy Gradients	ACKTR	Click	Click
09/07/19 #15 - 🔑	Sample Efficient Actor-Critic with Experience Replay, Wang et al.	2016	Policy Gradients	ACER	Click	Click
11/07/19 #16 - 🔑	Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al.	2018	Policy Gradients	SAC	Click	Click
09/07/19 #17 - 🐒	Deterministic Policy Gradient Algorithms, Silver et al.	2014	Deterministic PG	DPG	Click	Click
10/07/19 #18 - 🐒	Continuous Control With Deep Reinforcement Learning, Lillicrap et al.	2015	Deterministic PG	DDPG	Click	Click
12/07/19 #19 - 🐒	Addressing Function Approximation Error in Actor-Critic Methods, Fujimoto et al.	2018	Deterministic PG	TD3	Click	Click

1. Model-Free RL: (d) Distributional RL

Dopamine: A Research Framework for Deep Reinforcement Learning, Anonymous, 2018.

1. Model-Free RL: (e) Policy Gradients with Action-Dependent Baselines

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic, Gu et al, 2016. Algorithm: Q-Prop.
Action-depedent Control Variates for Policy Optimization via Stein’s Identity, Liu et al, 2017. Algorithm: Stein Control Variates.
The Mirage of Action-Dependent Baselines in Reinforcement Learning, Tucker et al, 2018. Contribution: interestingly, critiques and reevaluates claims from earlier papers (including Q-Prop and stein control variates) and finds important methodological errors in them.

1. Model-Free RL: (f) Path-Consistency Learning

Bridging the Gap Between Value and Policy Based Reinforcement Learning, Nachum et al, 2017.
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control, Nachum et al, 2017.

1. Model-Free RL: (g) Other Directions for Combining Policy-Learning and Q-Learning

Combining Policy Gradient and Q-learning, O’Donoghue et al, 2016.
The Reactor: A Fast and Sample-Efficient Actor-Critic Agent for Reinforcement Learning, Gruslys et al, 2017.
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning, Gu et al, 2017.
Equivalence Between Policy Gradients and Soft Q-Learning, Schulman et al, 2017.

1. Model-Free RL: (h) Evolutionary Algorithms

Evolution Strategies as a Scalable Alternative to Reinforcement Learning, Salimans et al, 2017.

2. Exploration

Read / Notes	Title & Author	Paper	Notes
# ❓	VIME: Variational Information Maximizing Exploration, Houthooft et al, 2016. Algorithm: VIME.	Click	Click
# ❓	Unifying Count-Based Exploration and Intrinsic Motivation, Bellemare et al, 2016. Algorithm: CTS-based Pseudocounts.	Click	Click
# ❓	Count-Based Exploration with Neural Density Models, Ostrovski et al, 2017. Algorithm: PixelCNN-based Pseudocounts.	Click	Click
# ❓	Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning, Tang et al, 2016. Algorithm: Hash-based Counts.	Click	Click
# ❓	EX2: Exploration with Exemplar Models for Deep Reinforcement Learning, Fu et al, 2017. Algorithm: EX2.	Click	Click
# ❓	Curiosity-driven Exploration by Self-supervised Prediction, Pathak et al, 2017. Algorithm: Intrinsic Curiosity Module (ICM).	Click	Click
# ❓	Large-Scale Study of Curiosity-Driven Learning, Burda et al, 2018. Contribution: Systematic analysis of how surprisal-based intrinsic motivation performs in a wide variety of environments.	Click	Click
# ❓	Exploration by Random Network Distillation, Burda et al, 2018. Algorithm: RND.	Click	Click
# ❓	Variational Intrinsic Control, Gregor et al, 2016. Algorithm: VIC.	Click	Click
# ❓	Diversity is All You Need: Learning Skills without a Reward Function, Eysenbach et al, 2018. Algorithm: DIAYN.	Click	Click
# ❓	Variational Option Discovery Algorithms, Achiam et al, 2018. Algorithm: VALOR.	Click	Click

3. Transfer and Multitask RL

Read / Notes	Title & Author	Year	Category	Algorithm	Paper	Notes
# ❓	Progressive Neural Networks, Rusu et al, 2016. Algorithm: Progressive Networks.				Click	Click
# ❓	Universal Value Function Approximators, Schaul et al, 2015. Algorithm: UVFA.				Click	Click
04/11/19 #3 - 😄	Reinforcement Learning with Unsupervised Auxiliary Tasks, Jaderberg et al	2016	Auxiliary	UNREAL	Click	Click
# ❓	The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously, Cabi et al, 2017. Algorithm: IU Agent.				Click	Click
# ❓	PathNet: Evolution Channels Gradient Descent in Super Neural Networks, Fernando et al, 2017. Algorithm: PathNet.				Click	Click
# ❓	Mutual Alignment Transfer Learning, Wulfmeier et al, 2017. Algorithm: MATL.				Click	Click
# ❓	Learning an Embedding Space for Transferable Robot Skills, Hausman et al, 2018.				Click	Click
# ❓	Hindsight Experience Replay, Andrychowicz et al, 2017. Algorithm: Hindsight Experience Replay (HER).				Click	Click

4. Hierarchy

Read / Notes	Title & Author	Paper	Notes
# ❓	Strategic Attentive Writer for Learning Macro-Actions, Vezhnevets et al, 2016. Algorithm: STRAW.	Click	Click
# ❓	FeUdal Networks for Hierarchical Reinforcement Learning, Vezhnevets et al, 2017. Algorithm: Feudal Networks	Click	Click
# ❓	Data-Efficient Hierarchical Reinforcement Learning, Nachum et al, 2018. Algorithm: HIRO.	Click	Click

5. Memory

Read / Notes	Title & Author	Paper	Notes
# ❓	Model-Free Episodic Control, Blundell et al, 2016. Algorithm: MFEC.	Click	Click
# ❓	Neural Episodic Control, Pritzel et al, 2017. Algorithm: NEC.	Click	Click
# ❓	Neural Map: Structured Memory for Deep Reinforcement Learning, Parisotto and Salakhutdinov, 2017. Algorithm: Neural Map.	Click	Click
# ❓	Unsupervised Predictive Memory in a Goal-Directed Agent, Wayne et al, 2018. Algorithm: MERLIN.	Click	Click
# ❓	Relational Recurrent Neural Networks, Santoro et al, 2018. Algorithm: RMC.	Click	Click

6. Model-Based RL

Read / Notes	Title & Author	Year	Category	Algorithm	Paper	Notes
30/12/19 # 😄	Dream to Control: Learning Behaviors by Latent Imagination, Hafner et al.	2019	Model Learning	Dreamer	Click	Click
# ❓	Imagination-Augmented Agents for Deep Reinforcement Learning, Weber et al, 2017. Algorithm: I2A.				Click	Click
# ❓	Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning, Nagabandi et al, 2017. Algorithm: MBMF.				Click	Click
# ❓	Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning, Feinberg et al, 2018. Algorithm: MVE.				Click	Click
# ❓	Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion, Buckman et al, 2018. Algorithm: STEVE.				Click	Click
# ❓	Model-Ensemble Trust-Region Policy Optimization, Kurutach et al, 2018. Algorithm: ME-TRPO.				Click	Click
# ❓	Model-Based Reinforcement Learning via Meta-Policy Optimization, Clavera et al, 2018. Algorithm: MB-MPO.				Click	Click
# ❓	Recurrent World Models Facilitate Policy Evolution, Ha and Schmidhuber, 2018.				Click	Click
# ❓	Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Silver et al, 2017. Algorithm: AlphaZero.				Click	Click
# ❓	Thinking Fast and Slow with Deep Learning and Tree Search, Anthony et al, 2017. Algorithm: ExIt.				Click	Click

7. Meta-RL

Read / Notes	Title & Author	Year	Category	Algorithm	Paper	Notes
14/11/19 - #1 😄	RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning, Duan et al, 2016. Algorithm: RL^2.				Click	Click
06/01/20 - #2 😄	Meta-Learners' Learning Dynamics are unlike learners', Rabinowitz	2019	Learning Dynamics	-	Click	Click
# ❓	Learning to Reinforcement Learn, Wang et al, 2016.				Click	Click
# ❓	Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al, 2017. Algorithm: MAML.				Click	Click
# ❓	A Simple Neural Attentive Meta-Learner, Mishra et al, 2018. Algorithm: SNAIL.				Click	Click

8. Scaling RL

Read / Notes	Title & Author	Year	Category	Algorithm	Paper	Notes
# 😄 - 23/12/19	Human-Level performance in first-person multiplayer games with population-based DRL, Jaderberg et al.	2018	Multi-Agent	PBT	Click	Click
# 😄 - 24/12/19	Grandmaster level in Starcraft II using MARL, Vinyals et al.	2019	Multi-Agent	League	Click	Click
# 😄 - 22/12/19	Emergent Tool Use From Multi-Agent Autocurricula by Baker et al.	Self-Play	2019	CT-DE	Paper	Notes
# ❓	Accelerated Methods for Deep Reinforcement Learning, Stooke and Abbeel, 2018. Contribution: Systematic analysis of parallelization in deep RL across algorithms.				Click	Click
# ❓	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures, Espeholt et al, 2018. Algorithm: IMPALA.				Click	Click
# ❓	Distributed Prioritized Experience Replay, Horgan et al, 2018. Algorithm: Ape-X.				Click	Click
# ❓	Recurrent Experience Replay in Distributed Reinforcement Learning, Anonymous, 2018. Algorithm: R2D2.				Click	Click
# ❓	RLlib: Abstractions for Distributed Reinforcement Learning, Liang et al, 2017. Contribution: A scalable library of RL algorithm implementations. Documentation link.				Click	Click

9. RL in the Real World

Read / Notes	Title & Author	Year	Category	Algorithm	Paper	Notes
#1 😄	Solving Rubik's Cubes with a Robotic Hand	2019	Robotics	ADR	Click	Click
# ❓	Benchmarking Reinforcement Learning Algorithms on Real-World Robots, Mahmood et al, 2018.				Click	Click
# ❓	Learning Dexterous In-Hand Manipulation, OpenAI, 2018.				Click	Click
# ❓	QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation, Kalashnikov et al, 2018. Algorithm: QT-Opt.				Click	Click
# ❓	Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform, Gauci et al, 2018.				Click	Click

10. Safety

Read / Notes	Title & Author	Paper	Notes
# ❓	Concrete Problems in AI Safety, Amodei et al, 2016. Contribution: establishes a taxonomy of safety problems, serving as an important jumping-off point for future research. We need to solve these!	Click	Click
# ❓	Deep Reinforcement Learning From Human Preferences, Christiano et al, 2017. Algorithm: LFP.	Click	Click
# ❓	Constrained Policy Optimization, Achiam et al, 2017. Algorithm: CPO.	Click	Click
# ❓	Safe Exploration in Continuous Action Spaces, Dalal et al, 2018. Algorithm: DDPG+Safety Layer.	Click	Click
# ❓	Trial without Error: Towards Safe Reinforcement Learning via Human Intervention, Saunders et al, 2017. Algorithm: HIRL.	Click	Click
# ❓	Leave No Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning, Eysenbach et al, 2017. Algorithm: Leave No Trace.	Click	Click

11. Imitation Learning and Inverse Reinforcement Learning

Read / Notes	Title & Author	Year	Category	Algorithm	Paper	Notes
05/11/19 #1 - 😄	Distilling Policy Distillation, Czarnecki et al.	2019	Distillation	ExpEntropyReg	Click	Click
# ❓	Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy, Ziebart 2010. Contributions: Crisp formulation of maximum entropy IRL.				Click	Click
# ❓	Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, Finn et al, 2016. Algorithm: GCL.				Click	Click
# ❓	Generative Adversarial Imitation Learning, Ho and Ermon, 2016. Algorithm: GAIL.				Click	Click
# ❓	DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills, Peng et al, 2018. Algorithm: DeepMimic.				Click	Click
# ❓	Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow, Peng et al, 2018. Algorithm: VAIL.				Click	Click
# ❓	One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL, Le Paine et al, 2018. Algorithm: MetaMimic.				Click	Click

12. Reproducibility, Analysis, and Critique

Read / Notes	Title & Author	Paper	Notes
# ❓	Benchmarking Deep Reinforcement Learning for Continuous Control, Duan et al, 2016. Contribution: rllab.	Click	Click
# ❓	Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control, Islam et al, 2017.	Click	Click
# ❓	Deep Reinforcement Learning that Matters, Henderson et al, 2017.
# ❓	Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods, Henderson et al, 2018.	Click	Click
# ❓	Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?, Ilyas et al, 2018.	Click	Click
# ❓	Simple Random Search Provides a Competitive Approach to Reinforcement Learning, Mania et al, 2018.	Click	Click
# ❓	Benchmarking Model-Based Reinforcement Learning, Wang et al, 2019.	Click	Click

13. Multi-Agent RL

Read / Notes	Title & Author	Year	Category	Algorithm	Paper	Notes
# 😄	Social Influence as Intrinsic Motivation for MA-DRL, Jaques et al.	2019	Social Influence	MOA	Click	Click

14. KL-Regularized RL

Read / Notes	Title & Author	Year	Category	Algorithm	Paper	Notes
# 😄	Information Asymmetry in KL-Regularized RL, Galashov et al.	2019	Learned Priors	-	Click	Click
# 😄	Neural Probabilistic Motor Primitives, Merel et al.	2019	Few-Shot Transfer	NPMP	Click	Click

15. Bonus: Classic Papers in RL Theory or Review

Read / Notes	Title & Author	Paper	Notes
# ❓	Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton et al, 2000. Contributions: Established policy gradient theorem and showed convergence of policy gradient algorithm for arbitrary policy classes.	Click	Click
# ❓	An Analysis of Temporal-Difference Learning with Function Approximation, Tsitsiklis and Van Roy, 1997. Contributions: Variety of convergence results and counter-examples for value-learning methods in RL.	Click	Click
# ❓	Reinforcement Learning of Motor Skills with Policy Gradients, Peters and Schaal, 2008. Contributions: Thorough review of policy gradient methods at the time, many of which are still serviceable descriptions of deep RL methods.	Click	Click
# ❓	Approximately Optimal Approximate Reinforcement Learning, Kakade and Langford, 2002. Contributions: Early roots for monotonic improvement theory, later leading to theoretical justification for TRPO and other algorithms.	Click	Click
# ❓	A Natural Policy Gradient, Kakade, 2002. Contributions: Brought natural gradients into RL, later leading to TRPO, ACKTR, and several other methods in deep RL.	Click	Click
# ❓	Algorithms for Reinforcement Learning, Szepesvari, 2009. Contributions: Unbeatable reference on RL before deep RL, containing foundations and theoretical background.	Click	Click

RobertTLange / spinningup-workspace