chpmoreno / spinningup-workspace

Reading notes & experiments on OpenAI's "Spinning Up in DRL" tutorial.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deep Reinforcement Learning Workbook

Author: Robert Tjarko Lange | 2019

In this repository I document my self-study of Deep Reinforcement Learning. More specifically, I collect reading notes as well as reproduction attempts. The chronology of this repository is based on the amazing "Spinning Up in DRL" tutorial by OpenAI which in my mind is the best resource on SOTA DRL as of today.

Here are all papers and the corresponding notes that I got to read so far:

1. Model-Free RL: (a) Deep Q-Learning

Supplementing Papers

1. Model-Free RL: (b) Policy Gradients

1. Model-Free RL: (c) Deterministic Policy Gradients

1. Model-Free RL: (d) Distributional RL

  • A Distributional Perspective on Reinforcement Learning, Bellemare et al, 2017.
  • Distributional Reinforcement Learning with Quantile Regression, Dabney et al, 2017.
  • Implicit Quantile Networks for Distributional Reinforcement Learning, Dabney et al, 2018.
  • Dopamine: A Research Framework for Deep Reinforcement Learning, Anonymous, 2018.

1. Model-Free RL: (e) Policy Gradients with Action-Dependent Baselines

  • Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic, Gu et al, 2016. Algorithm: Q-Prop.
  • Action-depedent Control Variates for Policy Optimization via Stein’s Identity, Liu et al, 2017. Algorithm: Stein Control Variates.
  • The Mirage of Action-Dependent Baselines in Reinforcement Learning, Tucker et al, 2018. Contribution: interestingly, critiques and reevaluates claims from earlier papers (including Q-Prop and stein control variates) and finds important methodological errors in them.

1. Model-Free RL: (f) Path-Consistency Learning

  • Bridging the Gap Between Value and Policy Based Reinforcement Learning, Nachum et al, 2017.
  • Trust-PCL: An Off-Policy Trust Region Method for Continuous Control, Nachum et al, 2017.

1. Model-Free RL: (g) Other Directions for Combining Policy-Learning and Q-Learning

  • Combining Policy Gradient and Q-learning, O’Donoghue et al, 2016.
  • The Reactor: A Fast and Sample-Efficient Actor-Critic Agent for Reinforcement Learning, Gruslys et al, 2017.
  • Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning, Gu et al, 2017.
  • Equivalence Between Policy Gradients and Soft Q-Learning, Schulman et al, 2017.

1. Model-Free RL: (h) Evolutionary Algorithms

  • Evolution Strategies as a Scalable Alternative to Reinforcement Learning, Salimans et al, 2017.

About

Reading notes & experiments on OpenAI's "Spinning Up in DRL" tutorial.


Languages

Language:Python 100.0%