RobertTLange / spinningup-workspace

Reading notes & PyTorch experiments on OpenAI's "Spinning Up in DRL" tutorial.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deep Reinforcement Learning Workbook

Author: Robert Tjarko Lange | 2019

In this repository I document my self-study of Deep Reinforcement Learning. More specifically, I collect reading notes as well as reproduction attempts. The chronology of this repository is based on the amazing "Spinning Up in DRL" tutorial by OpenAI which in my mind is the best resource on SOTA DRL as of today.

Here are all papers and the corresponding notes that I got to read so far:

1. Deep Q-Learning

Read / Notes Title & Author Year Category Algorithm Paper Notes
13/05/19 #1 - πŸ”₯ Playing Atari with Deep Reinforcement Learning, Mnih et al. 2013 Deep Q-Learning DQN Click Click
01/12/18 #2 - πŸ”₯ Deep Recurrent Q-Learning for Partially Observable MDPs, Hausknecht and Stone 2015 Deep Q-Learning DRQN Click Click
16/05/19 #3 - πŸ”₯ Deep Reinforcement Learning with Double Q-learning, van Hasselt et al. 2015 Deep Q-Learning DDQN Click Click
17/05/19 #4 - πŸ”₯ Prioritized Experience Replay, Schaul et al. 2016 Deep Q-Learning PER Click Click
15/05/19 #5 - πŸ”₯ Dueling Network Architectures for Deep Reinforcement Learning, Wang et al. 2016 Deep Q-Learning Dueling DQN Click Click
17/05/19 #6 - πŸ”₯ Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al. 2017 Deep Q-Learning Rainbow Click Click
24/05/19 #7 - πŸ”₯ Noisy Networks for Exploration, Fortunato et al. 2018 Deep Q-Learning Noisy Nets Click Click
25/05/19 #8 - πŸ”₯ A General Reinforcement Learning Algorithm that Masters Chess, Shogi and Go through Self-Play, Silver et al. 2019 Deep Q-Learning AlphaZero Click Click
25/12/19 - πŸ”₯ Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, Schrittwieser et al. 2019 Deep Q-Learning MuZero Click Click
29/07/19 #20 - πŸŒ” A Distributional Perspective on Reinforcement Learning, Bellemare et al. 2017 Distributional RL C51 Click Click
31/07/19 #21 - πŸŒ” Distributional Reinforcement Learning with Quantile Regression, Dabney et al. 2017 Distributional RL QR-DQN Click Click
01/08/19 #22 - πŸŒ” Implicit Quantile Networks for Distributional Reinforcement Learning, Dabney et al. 2018 Distributional RL IQN Click Click
02/08/19 #23 - πŸŒ” Deep Reinforcement Learning and the Deadly Triad, van Hasselt et al. 2018 Deep Q-Learning - Click Click
09/08/19 #24 - πŸŒ” Towards Characterizing Divergence in Deep Q Learning, Achiam et al. 2019 Deep Q-Learning PreQN Click Click
10/08/19 #25 - πŸŒ” Non-Delusional Q-Learning and Value Iteration, Lu et al. 2019 Deep Q-Learning PCVI/PCQL Click Click
15/08/19 #26 - πŸŒ” Ray Interference: A source of plateaus in DRL, Schaul et al. 2019 Deep Q-Learning - Click Click

2. Policy Gradient Methods

Read / Notes Title & Author Year Category Algorithm Paper Notes
25/05/019 #9 - πŸ”‘ Asynchronous Methods for Deep Reinforcement Learning, Mnih et al. 2016 Policy Gradients A3C Click Click
29/05/019 #10 - πŸ”‘ Trust Region Policy Optimization, Schulman et al. 2015 Policy Gradients TRPO Click Click
11/06/019 #11 - πŸ”‘ High-Dimensional Continuous Control Using Generalized Advantage Estimation, Schulman et al. 2016 Policy Gradients GAE Click Click
18/06/019 #12 - πŸ”‘ Proximal Policy Optimization Algorithms, Schulman et al. 2017 Policy Gradients PPO Click Click
20/06/19 #13 - πŸ”‘ Emergence of Locomotion Behaviours in Rich Environments, Heess et al. 2017 Policy Gradients - Click Click
20/06/19 #14 - πŸ”‘ Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, Wu et al. 2017 Policy Gradients ACKTR Click Click
09/07/19 #15 - πŸ”‘ Sample Efficient Actor-Critic with Experience Replay, Wang et al. 2016 Policy Gradients ACER Click Click
11/07/19 #16 - πŸ”‘ Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al. 2018 Policy Gradients SAC Click Click
09/07/19 #17 - πŸ’ Deterministic Policy Gradient Algorithms, Silver et al. 2014 Deterministic PG DPG Click Click
10/07/19 #18 - πŸ’ Continuous Control With Deep Reinforcement Learning, Lillicrap et al. 2015 Deterministic PG DDPG Click Click
12/07/19 #19 - πŸ’ Addressing Function Approximation Error in Actor-Critic Methods, Fujimoto et al. 2018 Deterministic PG TD3 Click Click

1. Model-Free RL: (d) Distributional RL

  • Dopamine: A Research Framework for Deep Reinforcement Learning, Anonymous, 2018.

1. Model-Free RL: (e) Policy Gradients with Action-Dependent Baselines

  • Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic, Gu et al, 2016. Algorithm: Q-Prop.
  • Action-depedent Control Variates for Policy Optimization via Stein’s Identity, Liu et al, 2017. Algorithm: Stein Control Variates.
  • The Mirage of Action-Dependent Baselines in Reinforcement Learning, Tucker et al, 2018. Contribution: interestingly, critiques and reevaluates claims from earlier papers (including Q-Prop and stein control variates) and finds important methodological errors in them.

1. Model-Free RL: (f) Path-Consistency Learning

  • Bridging the Gap Between Value and Policy Based Reinforcement Learning, Nachum et al, 2017.
  • Trust-PCL: An Off-Policy Trust Region Method for Continuous Control, Nachum et al, 2017.

1. Model-Free RL: (g) Other Directions for Combining Policy-Learning and Q-Learning

  • Combining Policy Gradient and Q-learning, O’Donoghue et al, 2016.
  • The Reactor: A Fast and Sample-Efficient Actor-Critic Agent for Reinforcement Learning, Gruslys et al, 2017.
  • Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning, Gu et al, 2017.
  • Equivalence Between Policy Gradients and Soft Q-Learning, Schulman et al, 2017.

1. Model-Free RL: (h) Evolutionary Algorithms

  • Evolution Strategies as a Scalable Alternative to Reinforcement Learning, Salimans et al, 2017.

2. Exploration

Read / Notes Title & Author Year Category Algorithm Paper Notes
# ❓ VIME: Variational Information Maximizing Exploration, Houthooft et al, 2016. Algorithm: VIME. Click Click
# ❓ Unifying Count-Based Exploration and Intrinsic Motivation, Bellemare et al, 2016. Algorithm: CTS-based Pseudocounts. Click Click
# ❓ Count-Based Exploration with Neural Density Models, Ostrovski et al, 2017. Algorithm: PixelCNN-based Pseudocounts. Click Click
# ❓ Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning, Tang et al, 2016. Algorithm: Hash-based Counts. Click Click
# ❓ EX2: Exploration with Exemplar Models for Deep Reinforcement Learning, Fu et al, 2017. Algorithm: EX2. Click Click
# ❓ Curiosity-driven Exploration by Self-supervised Prediction, Pathak et al, 2017. Algorithm: Intrinsic Curiosity Module (ICM). Click Click
# ❓ Large-Scale Study of Curiosity-Driven Learning, Burda et al, 2018. Contribution: Systematic analysis of how surprisal-based intrinsic motivation performs in a wide variety of environments. Click Click
# ❓ Exploration by Random Network Distillation, Burda et al, 2018. Algorithm: RND. Click Click
# ❓ Variational Intrinsic Control, Gregor et al, 2016. Algorithm: VIC. Click Click
# ❓ Diversity is All You Need: Learning Skills without a Reward Function, Eysenbach et al, 2018. Algorithm: DIAYN. Click Click
# ❓ Variational Option Discovery Algorithms, Achiam et al, 2018. Algorithm: VALOR. Click Click

3. Transfer and Multitask RL

Read / Notes Title & Author Year Category Algorithm Paper Notes
# ❓ Progressive Neural Networks, Rusu et al, 2016. Algorithm: Progressive Networks. Click Click
# ❓ Universal Value Function Approximators, Schaul et al, 2015. Algorithm: UVFA. Click Click
04/11/19 #3 - πŸ˜„ Reinforcement Learning with Unsupervised Auxiliary Tasks, Jaderberg et al 2016 Auxiliary UNREAL Click Click
# ❓ The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously, Cabi et al, 2017. Algorithm: IU Agent. Click Click
# ❓ PathNet: Evolution Channels Gradient Descent in Super Neural Networks, Fernando et al, 2017. Algorithm: PathNet. Click Click
# ❓ Mutual Alignment Transfer Learning, Wulfmeier et al, 2017. Algorithm: MATL. Click Click
# ❓ Learning an Embedding Space for Transferable Robot Skills, Hausman et al, 2018. Click Click
# ❓ Hindsight Experience Replay, Andrychowicz et al, 2017. Algorithm: Hindsight Experience Replay (HER). Click Click

4. Hierarchy

Read / Notes Title & Author Year Category Algorithm Paper Notes
# ❓ Strategic Attentive Writer for Learning Macro-Actions, Vezhnevets et al, 2016. Algorithm: STRAW. Click Click
# ❓ FeUdal Networks for Hierarchical Reinforcement Learning, Vezhnevets et al, 2017. Algorithm: Feudal Networks Click Click
# ❓ Data-Efficient Hierarchical Reinforcement Learning, Nachum et al, 2018. Algorithm: HIRO. Click Click

5. Memory

Read / Notes Title & Author Year Category Algorithm Paper Notes
# ❓ Model-Free Episodic Control, Blundell et al, 2016. Algorithm: MFEC. Click Click
# ❓ Neural Episodic Control, Pritzel et al, 2017. Algorithm: NEC. Click Click
# ❓ Neural Map: Structured Memory for Deep Reinforcement Learning, Parisotto and Salakhutdinov, 2017. Algorithm: Neural Map. Click Click
# ❓ Unsupervised Predictive Memory in a Goal-Directed Agent, Wayne et al, 2018. Algorithm: MERLIN. Click Click
# ❓ Relational Recurrent Neural Networks, Santoro et al, 2018. Algorithm: RMC. Click Click

6. Model-Based RL

Read / Notes Title & Author Year Category Algorithm Paper Notes
30/12/19 # πŸ˜„ Dream to Control: Learning Behaviors by Latent Imagination, Hafner et al. 2019 Model Learning Dreamer Click Click
# ❓ Imagination-Augmented Agents for Deep Reinforcement Learning, Weber et al, 2017. Algorithm: I2A. Click Click
# ❓ Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning, Nagabandi et al, 2017. Algorithm: MBMF. Click Click
# ❓ Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning, Feinberg et al, 2018. Algorithm: MVE. Click Click
# ❓ Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion, Buckman et al, 2018. Algorithm: STEVE. Click Click
# ❓ Model-Ensemble Trust-Region Policy Optimization, Kurutach et al, 2018. Algorithm: ME-TRPO. Click Click
# ❓ Model-Based Reinforcement Learning via Meta-Policy Optimization, Clavera et al, 2018. Algorithm: MB-MPO. Click Click
# ❓ Recurrent World Models Facilitate Policy Evolution, Ha and Schmidhuber, 2018. Click Click
# ❓ Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Silver et al, 2017. Algorithm: AlphaZero. Click Click
# ❓ Thinking Fast and Slow with Deep Learning and Tree Search, Anthony et al, 2017. Algorithm: ExIt. Click Click

7. Meta-RL

Read / Notes Title & Author Year Category Algorithm Paper Notes
14/11/19 - #1 πŸ˜„ RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning, Duan et al, 2016. Algorithm: RL^2. Click Click
06/01/20 - #2 πŸ˜„ Meta-Learners' Learning Dynamics are unlike learners', Rabinowitz 2019 Learning Dynamics - Click Click
# ❓ Learning to Reinforcement Learn, Wang et al, 2016. Click Click
# ❓ Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al, 2017. Algorithm: MAML. Click Click
# ❓ A Simple Neural Attentive Meta-Learner, Mishra et al, 2018. Algorithm: SNAIL. Click Click

8. Scaling RL

Read / Notes Title & Author Year Category Algorithm Paper Notes
# πŸ˜„ - 23/12/19 Human-Level performance in first-person multiplayer games with population-based DRL, Jaderberg et al. 2018 Multi-Agent PBT Click Click
# πŸ˜„ - 24/12/19 Grandmaster level in Starcraft II using MARL, Vinyals et al. 2019 Multi-Agent League Click Click
# πŸ˜„ - 22/12/19 Emergent Tool Use From Multi-Agent Autocurricula by Baker et al. Self-Play 2019 CT-DE Paper Notes
# ❓ Accelerated Methods for Deep Reinforcement Learning, Stooke and Abbeel, 2018. Contribution: Systematic analysis of parallelization in deep RL across algorithms. Click Click
# ❓ IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures, Espeholt et al, 2018. Algorithm: IMPALA. Click Click
# ❓ Distributed Prioritized Experience Replay, Horgan et al, 2018. Algorithm: Ape-X. Click Click
# ❓ Recurrent Experience Replay in Distributed Reinforcement Learning, Anonymous, 2018. Algorithm: R2D2. Click Click
# ❓ RLlib: Abstractions for Distributed Reinforcement Learning, Liang et al, 2017. Contribution: A scalable library of RL algorithm implementations. Documentation link. Click Click

9. RL in the Real World

Read / Notes Title & Author Year Category Algorithm Paper Notes
#1 πŸ˜„ Solving Rubik's Cubes with a Robotic Hand 2019 Robotics ADR Click Click
# ❓ Benchmarking Reinforcement Learning Algorithms on Real-World Robots, Mahmood et al, 2018. Click Click
# ❓ Learning Dexterous In-Hand Manipulation, OpenAI, 2018. Click Click
# ❓ QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation, Kalashnikov et al, 2018. Algorithm: QT-Opt. Click Click
# ❓ Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform, Gauci et al, 2018. Click Click

10. Safety

Read / Notes Title & Author Year Category Algorithm Paper Notes
# ❓ Concrete Problems in AI Safety, Amodei et al, 2016. Contribution: establishes a taxonomy of safety problems, serving as an important jumping-off point for future research. We need to solve these! Click Click
# ❓ Deep Reinforcement Learning From Human Preferences, Christiano et al, 2017. Algorithm: LFP. Click Click
# ❓ Constrained Policy Optimization, Achiam et al, 2017. Algorithm: CPO. Click Click
# ❓ Safe Exploration in Continuous Action Spaces, Dalal et al, 2018. Algorithm: DDPG+Safety Layer. Click Click
# ❓ Trial without Error: Towards Safe Reinforcement Learning via Human Intervention, Saunders et al, 2017. Algorithm: HIRL. Click Click
# ❓ Leave No Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning, Eysenbach et al, 2017. Algorithm: Leave No Trace. Click Click

11. Imitation Learning and Inverse Reinforcement Learning

Read / Notes Title & Author Year Category Algorithm Paper Notes
05/11/19 #1 - πŸ˜„ Distilling Policy Distillation, Czarnecki et al. 2019 Distillation ExpEntropyReg Click Click
# ❓ Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy, Ziebart 2010. Contributions: Crisp formulation of maximum entropy IRL. Click Click
# ❓ Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, Finn et al, 2016. Algorithm: GCL. Click Click
# ❓ Generative Adversarial Imitation Learning, Ho and Ermon, 2016. Algorithm: GAIL. Click Click
# ❓ DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills, Peng et al, 2018. Algorithm: DeepMimic. Click Click
# ❓ Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow, Peng et al, 2018. Algorithm: VAIL. Click Click
# ❓ One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL, Le Paine et al, 2018. Algorithm: MetaMimic. Click Click

12. Reproducibility, Analysis, and Critique

Read / Notes Title & Author Year Category Algorithm Paper Notes
# ❓ Benchmarking Deep Reinforcement Learning for Continuous Control, Duan et al, 2016. Contribution: rllab. Click Click
# ❓ Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control, Islam et al, 2017. Click Click
# ❓ Deep Reinforcement Learning that Matters, Henderson et al, 2017.
# ❓ Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods, Henderson et al, 2018. Click Click
# ❓ Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?, Ilyas et al, 2018. Click Click
# ❓ Simple Random Search Provides a Competitive Approach to Reinforcement Learning, Mania et al, 2018. Click Click
# ❓ Benchmarking Model-Based Reinforcement Learning, Wang et al, 2019. Click Click

13. Multi-Agent RL

Read / Notes Title & Author Year Category Algorithm Paper Notes
# πŸ˜„ Social Influence as Intrinsic Motivation for MA-DRL, Jaques et al. 2019 Social Influence MOA Click Click

14. KL-Regularized RL

Read / Notes Title & Author Year Category Algorithm Paper Notes
# πŸ˜„ Information Asymmetry in KL-Regularized RL, Galashov et al. 2019 Learned Priors - Click Click
# πŸ˜„ Neural Probabilistic Motor Primitives, Merel et al. 2019 Few-Shot Transfer NPMP Click Click

15. Bonus: Classic Papers in RL Theory or Review

Read / Notes Title & Author Year Category Algorithm Paper Notes
# ❓ Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton et al, 2000. Contributions: Established policy gradient theorem and showed convergence of policy gradient algorithm for arbitrary policy classes. Click Click
# ❓ An Analysis of Temporal-Difference Learning with Function Approximation, Tsitsiklis and Van Roy, 1997. Contributions: Variety of convergence results and counter-examples for value-learning methods in RL. Click Click
# ❓ Reinforcement Learning of Motor Skills with Policy Gradients, Peters and Schaal, 2008. Contributions: Thorough review of policy gradient methods at the time, many of which are still serviceable descriptions of deep RL methods. Click Click
# ❓ Approximately Optimal Approximate Reinforcement Learning, Kakade and Langford, 2002. Contributions: Early roots for monotonic improvement theory, later leading to theoretical justification for TRPO and other algorithms. Click Click
# ❓ A Natural Policy Gradient, Kakade, 2002. Contributions: Brought natural gradients into RL, later leading to TRPO, ACKTR, and several other methods in deep RL. Click Click
# ❓ Algorithms for Reinforcement Learning, Szepesvari, 2009. Contributions: Unbeatable reference on RL before deep RL, containing foundations and theoretical background. Click Click

About

Reading notes & PyTorch experiments on OpenAI's "Spinning Up in DRL" tutorial.


Languages

Language:Python 100.0%