In this repository I document my self-study of Deep Reinforcement Learning. More specifically, I collect reading notes as well as reproduction attempts. The chronology of this repository is based on the amazing "Spinning Up in DRL" tutorial by OpenAI which in my mind is the best resource on SOTA DRL as of today.
Here are all papers and the corresponding notes that I got to read so far:
1. Deep Q-Learning
Read / Notes
Title & Author
Year
Category
Algorithm
Paper
Notes
13/05/19 #1 - π₯
Playing Atari with Deep Reinforcement Learning, Mnih et al.
Dopamine: A Research Framework for Deep Reinforcement Learning, Anonymous, 2018.
1. Model-Free RL: (e) Policy Gradients with Action-Dependent Baselines
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic, Gu et al, 2016. Algorithm: Q-Prop.
Action-depedent Control Variates for Policy Optimization via Steinβs Identity, Liu et al, 2017. Algorithm: Stein Control Variates.
The Mirage of Action-Dependent Baselines in Reinforcement Learning, Tucker et al, 2018. Contribution: interestingly, critiques and reevaluates claims from earlier papers (including Q-Prop and stein control variates) and finds important methodological errors in them.
1. Model-Free RL: (f) Path-Consistency Learning
Bridging the Gap Between Value and Policy Based Reinforcement Learning, Nachum et al, 2017.
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control, Nachum et al, 2017.
1. Model-Free RL: (g) Other Directions for Combining Policy-Learning and Q-Learning
Combining Policy Gradient and Q-learning, OβDonoghue et al, 2016.
The Reactor: A Fast and Sample-Efficient Actor-Critic Agent for Reinforcement Learning, Gruslys et al, 2017.
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning, Gu et al, 2017.
Equivalence Between Policy Gradients and Soft Q-Learning, Schulman et al, 2017.
1. Model-Free RL: (h) Evolutionary Algorithms
Evolution Strategies as a Scalable Alternative to Reinforcement Learning, Salimans et al, 2017.
2. Exploration
Read / Notes
Title & Author
Year
Category
Algorithm
Paper
Notes
# β
VIME: Variational Information Maximizing Exploration, Houthooft et al, 2016. Algorithm: VIME.
Large-Scale Study of Curiosity-Driven Learning, Burda et al, 2018. Contribution: Systematic analysis of how surprisal-based intrinsic motivation performs in a wide variety of environments.
Accelerated Methods for Deep Reinforcement Learning, Stooke and Abbeel, 2018. Contribution: Systematic analysis of parallelization in deep RL across algorithms.
RLlib: Abstractions for Distributed Reinforcement Learning, Liang et al, 2017. Contribution: A scalable library of RL algorithm implementations. Documentation link.
Concrete Problems in AI Safety, Amodei et al, 2016. Contribution: establishes a taxonomy of safety problems, serving as an important jumping-off point for future research. We need to solve these!
Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy, Ziebart 2010. Contributions: Crisp formulation of maximum entropy IRL.
Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow, Peng et al, 2018. Algorithm: VAIL.
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton et al, 2000. Contributions: Established policy gradient theorem and showed convergence of policy gradient algorithm for arbitrary policy classes.
An Analysis of Temporal-Difference Learning with Function Approximation, Tsitsiklis and Van Roy, 1997. Contributions: Variety of convergence results and counter-examples for value-learning methods in RL.
Reinforcement Learning of Motor Skills with Policy Gradients, Peters and Schaal, 2008. Contributions: Thorough review of policy gradient methods at the time, many of which are still serviceable descriptions of deep RL methods.
Approximately Optimal Approximate Reinforcement Learning, Kakade and Langford, 2002. Contributions: Early roots for monotonic improvement theory, later leading to theoretical justification for TRPO and other algorithms.
A Natural Policy Gradient, Kakade, 2002. Contributions: Brought natural gradients into RL, later leading to TRPO, ACKTR, and several other methods in deep RL.
Algorithms for Reinforcement Learning, Szepesvari, 2009. Contributions: Unbeatable reference on RL before deep RL, containing foundations and theoretical background.