Deep Reinforcement Learning survey

This paper list is a bit different from others. I'll put some opinion and summary on it. However, to understand the whole paper, you still have to read it by yourself!
Surely, any pull request or discussion are welcomed!

Outline

Papers
Open Source
Courses
- Python users
- Lua users
Textbook
Misc

Papers

Deep Reinforcement Learning with Double Q-learning [AAAI 2016]
- Hado van Hasselt, Arthur Guez, David Silver
- Deal with overestimation of Q-values
- Separate action-select-Q and predict-Q
Playing Atari with Deep Reinforcement Learning [NIPS 2013 Deep Learning Workshop]
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves Ioannis Antonoglou, Daan Wierstra
Human-level control through deep reinforcement learning, [Nature 2015]
- Most optimization algorithms assume that the samples are independently and identically distributed, while for reinforcement learning, the data is a sequence of action, which breaks the assumption.
- Experience replay(off-policy)
- Iterative update Q-value
- Source code [Torch]
Asynchronous Methods for Deep Reinforcement Learning [ICML 2016]
- Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu
- On-policy updates
- Implementation from others: async-rl
- Asynchronous SGD, explain what "asynchronous" means.
- Tuning Deep Learning Episode 1: DeepMind's A3C in Torch
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection [arXiv 2016]
- Sergey Levine, Peter Pastor, Alex Krizhevsky, Deirdre Quillen
- Deep Learning for Robots: Learning from Large-Scale Interaction
Active Object Localization with Deep Reinforcement Learning [ICCV 2015]
- Juan C. Caicedo, Svetlana Lazebnik
- Agent learns to deform a bounding box using simple transformation action(map the object detection task to RL)
- Ideas similar to G-CNN: an Iterative Grid Based Object Detector
Dueling Network Architectures for Deep Reinforcement Learning [ICML 2016]
- Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
- Best Paper in ICML 2016
- Pose the question: Is conventional CNN suitable for RL tasks?
- Two stream network(state-value and advantage funvtion)
- Focusing on innovating a neural network architecture that is better suited for model-free RL
- Torch blog - Dueling Deep Q-Networks
Memory-based control with recurrent neural networks [NIPS 2015 Deep Reinforcement Learning Workshop]
- Nicolas Heess, Jonathan J Hunt, Timothy P Lillicrap, David Silver
- Use RNN to solve partially-observed problem
Control of Memory, Active Perception, and Action in Minecraft [arXiv 2016]
- Junhyuk Oh, Valliappa Chockalingam, Satinder Singh, Honglak Lee
- Solving problem concerning to partial observability
- Propose mincraft task
- Memory Q-Network (MQN), Recurrent Memory Q-Network (RMQN), and Feedback Recurrent Memory Q-Network (FRMQN)
Continuous Control With Deep Reinforcement Learning [ICLR 2016]
- Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra
- Solves the continuous control task, and avoids the curse of dimension
- Deep version of DPG(deterministic policy gradient)
- When going deep, some issues will happens. It's unstable to use the non-linear function to approxiamate
- The different components of the observation may have different physical units and the ranges may vary across environments. => solve by batch normalization
- For exploration, adding the noise to the actor policy: µ0(st) = µ(st|θt µ) + N
Deterministic Policy Gradient Algorithms [ICML 2014]
- D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller
- Highly recommended for learning policy network, and actor-critic algorithms
- In continuous action spaces, greedy policy improvement becomes problematic, requiring a global maximisation at every step. Instead, a simple and computationally attractive alternative is to move the policy in the direction of the gradient of Q, rather than globally maximising Q
Mastering the game of Go with deep neural networks and tree search [Nature 2016]
- David Silver, Aja Huang
- First stage: supervised learning policy network, including rollout policy and SL policy network(learn the knowledge from human experts)
  - Rollout policy is used for predicting fast but relatively inaccurate decision
  - SL policy network is used for initialization of RL policy network(improved by policy gradient)
- To prevent overfit, auto-generate the sample from self-play(half) and train with the KGS dataset(half)
- Use Monte Carlo tree search with policy network and value network. To understand the MCTS more, plz refer to here
  - Selection: select the most promising action depends on Q+u(P) --> depth L
  - Expansion: after L steps, create a new child
  - Evaluation: evaluated by the mixture of value network and simulated rollout
  - Backup: Calculate and store the Q(s,a), N(s,a), which is used in Selection

Open Source

Python users[Tensorflow, Theano]

OpenAI gym
- RL benchmarking toolkit
- Provide environment and evaluation metrics
Keras
- Fully compatible with OpenAI
- Some algorithms have been implement(e.g DQN, DDQN, DDPG, CDQN)
TensorLayer
- Built on the top of Google TensorFlow
rllab
- Fully compatible with OpenAI
- Continuous control tasks
- Nice to implement new algorothms
- Benchmarking Deep Reinforcement Learning for Continuous Control
KEras
- Built on keras
- Fully compatible with OpenAI
- Host a handful of agent of reinforcement learning agents
- Deep Reinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL Agent
Deep Reinforcement Learning in TensorFlow
- Implemented by @carpedm20
- Having some basic reinforcement algorothms

Lua users[Torch]

rltorch, basic reinforcement learning package
awesome-torch for reinforcement learning
- List of open sources for rerinforcement learning

Course

CS 294: Deep Reinforcement Learning
- Instructors: John Schulman, Pieter Abbeel
UC Berkeley CS188 Intro to AI
- 2013 Spring video on youtube
Advanced Topics: RL
- Instructors: David Silver
Deep learning videoes at Oxford 2015
- Instructors: Nando de Freitas
- lecture 15, 16 are strongly related to reinforcement learning

Textbook

Foundations_of_Machine_Learning
- chapter 14: Reinforcement learning

Misc

A collection of Deep Learning resources
Deep Reinforcement Learning: Pong from Pixels, from Andrej Karpathy' blog
- policy gradient (very clear!)
Guest Post (Part I): Demystifying Deep Reinforcement Learning
Reinforcement Learning and Control, lecture from Andrew Ng
- basic reinforcement learning
- continuous state MDPs
DEEP REINFORCEMENT LEARNING, from David Silver, Google DeepMind
- briefly discuss some work done by deepmind
What are the benefits of actor/critic framework in reinforcement learning?
- Clearly expain the advantages of actor/critic

Beronx86 / Deep-Reinforcement-Learning-Survey