cookiegg / Deep-Reinforcement-Learning-Survey

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deep Reinforcement Learning survey

This paper list is a bit different from others. I'll put some opinion and summary on it. However, to understand the whole paper, you still have to read it by yourself!
Surely, any pull request or discussion are welcomed!

Outline

Papers

  • Deep Reinforcement Learning with Double Q-learning [AAAI 2016]
    • Hado van Hasselt, Arthur Guez, David Silver
    • Deal with overestimation of Q-values
    • Separate action-select-Q and predict-Q
  • Playing Atari with Deep Reinforcement Learning [NIPS 2013 Deep Learning Workshop]
    • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves Ioannis Antonoglou, Daan Wierstra
  • Human-level control through deep reinforcement learning, [Nature 2015]
    • Most optimization algorithms assume that the samples are independently and identically distributed, while for reinforcement learning, the data is a sequence of action, which breaks the assumption.
    • Experience replay
    • Iterative update Q-value
    • Source code [Torch]
  • Asynchronous Methods for Deep Reinforcement Learning [ICML 2016]
  • Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection [arXiv 2016]
  • Active Object Localization with Deep Reinforcement Learning [ICCV 2015]
    • Juan C. Caicedo, Svetlana Lazebnik
    • Agent learns to deform a bounding box using simple transformation action
  • Dueling Network Architectures for Deep Reinforcement Learning [ICML 2016]
    • Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
    • Two stream network(state-value and advantage funvtion)
    • Focusing on innovating a neural network architecture that is better suited for model-free RL
    • Torch blog - Dueling Deep Q-Networks
  • Memory-based control with recurrent neural networks [NIPS 2015 Deep Reinforcement Learning Workshop]
    • Nicolas Heess, Jonathan J Hunt, Timothy P Lillicrap, David Silver
    • Solve partially-observed problem
  • Control of Memory, Active Perception, and Action in Minecraft [arXiv 2016]
    • Junhyuk Oh, Valliappa Chockalingam, Satinder Singh, Honglak Lee
    • Solving problem concerning to partial observability
    • Propose mincraft task
    • Memory Q-Network (MQN), Recurrent Memory Q-Network (RMQN), and Feedback Recurrent Memory Q-Network (FRMQN)
  • Continuous Control With Deep Reinforcement Learning [ICLR 2016]
    • Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra
    • Solves the continuous control task, and avoids the curse of dimension
    • Deep version of DPG(deterministic policy gradient)
    • When going deep, some issues will happens. It's unstable to use the non-linear function to approxiamate
    • The different components of the observation may have different physical units and the ranges may vary across environments. => solve by batch normalization
    • For exploration, adding the noise to the actor policy: µ0(st) = µ(st|θt µ) + N
  • Deterministic Policy Gradient Algorithms [ICML 2014]
    • D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller
    • Highly recommended for learning policy network, and actor-critic algorithms
    • In continuous action spaces, greedy policy improvement becomes problematic, requiring a global maximisation at every step. Instead, a simple and computationally attractive alternative is to move the policy in the direction of the gradient of Q, rather than globally maximising Q

Open Source

Python users[Tensorflow, Theano]

Lua users[Torch]

Course

Textbook

Misc

About