Deep Reinforcement Learning Workbook
Author: Robert Tjarko Lange | 2019
In this repository I document my self-study of Deep Reinforcement Learning. More specifically, I collect reading notes as well as reproduction attempts. The chronology of this repository is based on the amazing "Spinning Up in DRL" tutorial by OpenAI which in my mind is the best resource on SOTA DRL as of today.
Here are all papers and the corresponding notes that I got to read so far:
1. Model-Free RL: (a) Deep Q-Learning
- Playing Atari with Deep Reinforcement Learning, Mnih et al, 2013.
- Deep Recurrent Q-Learning for Partially Observable MDPs, Hausknecht and Stone, 2015.
- Deep Reinforcement Learning with Double Q-learning, Hasselt et al 2015.
- Prioritized Experience Replay, Schaul et al, 2016.
- Dueling Network Architectures for Deep Reinforcement Learning, Wang et al, 2016.
- Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al, 2017.
Supplementing Papers
- Noisy Networks for Exploration, Fortunato et al, 2018.
- A General Reinforcement Learning Algorithm that Masters Chess, Shogi and Go through Self-Play, Silver et al 2019.
1. Model-Free RL: (b) Policy Gradients
- Asynchronous Methods for Deep Reinforcement Learning, Mnih et al, 2016.
- Trust Region Policy Optimization, Schulman et al, 2015.
- High-Dimensional Continuous Control Using Generalized Advantage Estimation, Schulman et al, 2016. Algorithm: GAE.
- Proximal Policy Optimization Algorithms, Schulman et al, 2017.
- Emergence of Locomotion Behaviours in Rich Environments, Heess et al, 2017.
- Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, Wu et al, 2017.
- Sample Efficient Actor-Critic with Experience Replay, Wang et al, 2016.
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al, 2018. Algorithm: SAC.
1. Model-Free RL: (c) Deterministic Policy Gradients
- Deterministic Policy Gradient Algorithms, Silver et al, 2014.
- Continuous Control With Deep Reinforcement Learning, Lillicrap et al, 2015.
- Addressing Function Approximation Error in Actor-Critic Methods, Fujimoto et al, 2018.
1. Model-Free RL: (d) Distributional RL
- A Distributional Perspective on Reinforcement Learning, Bellemare et al, 2017.
- Distributional Reinforcement Learning with Quantile Regression, Dabney et al, 2017.
- Implicit Quantile Networks for Distributional Reinforcement Learning, Dabney et al, 2018.
- Dopamine: A Research Framework for Deep Reinforcement Learning, Anonymous, 2018.
1. Model-Free RL: (e) Policy Gradients with Action-Dependent Baselines
- Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic, Gu et al, 2016. Algorithm: Q-Prop.
- Action-depedent Control Variates for Policy Optimization via Stein’s Identity, Liu et al, 2017. Algorithm: Stein Control Variates.
- The Mirage of Action-Dependent Baselines in Reinforcement Learning, Tucker et al, 2018. Contribution: interestingly, critiques and reevaluates claims from earlier papers (including Q-Prop and stein control variates) and finds important methodological errors in them.
1. Model-Free RL: (f) Path-Consistency Learning
- Bridging the Gap Between Value and Policy Based Reinforcement Learning, Nachum et al, 2017.
- Trust-PCL: An Off-Policy Trust Region Method for Continuous Control, Nachum et al, 2017.
1. Model-Free RL: (g) Other Directions for Combining Policy-Learning and Q-Learning
- Combining Policy Gradient and Q-learning, O’Donoghue et al, 2016.
- The Reactor: A Fast and Sample-Efficient Actor-Critic Agent for Reinforcement Learning, Gruslys et al, 2017.
- Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning, Gu et al, 2017.
- Equivalence Between Policy Gradients and Soft Q-Learning, Schulman et al, 2017.
1. Model-Free RL: (h) Evolutionary Algorithms
- Evolution Strategies as a Scalable Alternative to Reinforcement Learning, Salimans et al, 2017.