作者:Alex-zhai 链接:https://zhuanlan.zhihu.com/p/23600620 来源:知乎 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
一. 开山鼻祖DQN
-
Playing Atari with Deep Reinforcement Learning,V. Mnih et al., NIPS Workshop, 2013.
-
Human-level control through deep reinforcement learning, V. Mnih et al., Nature, 2015.
二. DQN的各种改进版本(侧重于算法上的改进)
-
Dueling Network Architectures for Deep Reinforcement Learning. Z. Wang et al., arXiv, 2015.
-
Prioritized Experience Replay, T. Schaul et al., ICLR, 2016.
-
Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al., arXiv, 2015.
-
Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al., AAAI, 2016.
-
Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al., IJCAI Deep RL Workshop, 2016.
-
Deep Exploration via Bootstrapped DQN, I. Osband et al., arXiv, 2016.
-
How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies, V. François-Lavet et al., NIPS Workshop, 2015.
-
Learning functions across many orders of magnitudes,H Van Hasselt,A Guez,M Hessel,D Silver
-
Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al., ICML Workshop, 2015.
-
State of the Art Control of Atari Games using shallow reinforcement learning
-
Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening(11.13更新)
-
Deep Reinforcement Learning with Averaged Target DQN(11.14更新)
三. DQN的各种改进版本(侧重于模型的改进)
-
Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone, arXiv, 2015.
-
Deep Attention Recurrent Q-Network
-
Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al., ICML, 2016.
-
Progressive Neural Networks
-
Language Understanding for Text-based Games Using Deep Reinforcement Learning
-
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks
-
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
-
Recurrent Reinforcement Learning: A Hybrid Approach
四. 基于策略梯度的深度强化学习
深度策略梯度:
-
End-to-End Training of Deep Visuomotor Policies
-
Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search
-
Trust Region Policy Optimization
深度行动者评论家算法:
-
Deterministic Policy Gradient Algorithms
-
Continuous control with deep reinforcement learning
-
High-Dimensional Continuous Control Using Using Generalized Advantage Estimation
-
Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies
-
Deep Reinforcement Learning in Parameterized Action Space
-
Memory-based control with recurrent neural networks
-
Terrain-adaptive locomotion skills using deep reinforcement learning
-
Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies
-
SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY(11.13更新)
搜索与监督:
-
End-to-End Training of Deep Visuomotor Policies
-
Interactive Control of Diverse Complex Characters with Neural Networks
连续动作空间下探索改进:
- Curiosity-driven Exploration in DRL via Bayesian Neuarl Networks
结合策略梯度和Q学习:
-
Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC(11.13更新)
-
PGQ: COMBINING POLICY GRADIENT AND Q-LEARNING(11.13更新)
其它策略梯度文章:
-
Gradient Estimation Using Stochastic Computation Graphs
-
Continuous Deep Q-Learning with Model-based Acceleration
-
Benchmarking Deep Reinforcement Learning for Continuous Control
-
Learning Continuous Control Policies by Stochastic Value Gradients
五. 分层DRL
-
Deep Successor Reinforcement Learning
-
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
-
Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks
-
Stochastic Neural Networks for Hierarchical Reinforcement Learning – Authors: Carlos Florensa, Yan Duan, Pieter Abbeel (11.14更新)
六. DRL中的多任务和迁移学习
-
ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources
-
A Deep Hierarchical Approach to Lifelong Learning in Minecraft
-
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
-
Policy Distillation
-
Progressive Neural Networks
-
Universal Value Function Approximators
-
Multi-task learning with deep model based reinforcement learning(11.14更新)
-
Modular Multitask Reinforcement Learning with Policy Sketches (11.14更新)
七. 基于外部记忆模块的DRL模型
-
Control of Memory, Active Perception, and Action in Minecraft
-
Model-Free Episodic Control
八. DRL中探索与利用问题
-
Action-Conditional Video Prediction using Deep Networks in Atari Games
-
Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks
-
Deep Exploration via Bootstrapped DQN
-
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
-
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
-
Unifying Count-Based Exploration and Intrinsic Motivation
-
#Exploration: A Study of Count-Based Exploration for Deep Reinforcemen Learning(11.14更新)
-
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning(11.14更新)
九. 多Agent的DRL
-
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks
-
Multiagent Cooperation and Competition with Deep Reinforcement Learning
十. 逆向DRL
-
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
-
Maximum Entropy Deep Inverse Reinforcement Learning
-
Generalizing Skills with Semi-Supervised Reinforcement Learning(11.14更新)
十一. 探索+监督学习
-
Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning
-
Better Computer Go Player with Neural Network and Long-term Prediction
-
Mastering the game of Go with deep neural networks and tree search, D. Silver et al., Nature, 2016.
十二. 异步DRL
-
Asynchronous Methods for Deep Reinforcement Learning
-
Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU(11.14更新)
十三:适用于难度较大的游戏场景
-
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv, 2016.
-
Strategic Attentive Writer for Learning Macro-Actions
-
Unifying Count-Based Exploration and Intrinsic Motivation
十四:单个网络玩多个游戏
-
Policy Distillation
-
Universal Value Function Approximators
-
Learning values across many orders of magnitude
十五:德州poker
-
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
-
Fictitious Self-Play in Extensive-Form Games
-
Smooth UCT search in computer poker
十六:Doom游戏
-
ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning
-
Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning
-
Playing FPS Games with Deep Reinforcement Learning
-
LEARNING TO ACT BY PREDICTING THE FUTURE(11.13更新)
-
Deep Reinforcement Learning From Raw Pixels in Doom(11.14更新)
十七:大规模动作空间
- Deep Reinforcement Learning in Large Discrete Action Spaces
十八:参数化连续动作空间
- Deep Reinforcement Learning in Parameterized Action Space
十九:Deep Model
-
Learning Visual Predictive Models of Physics for Playing Billiards
-
J. Schmidhuber, On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models, arXiv, 2015. arXiv
-
Learning Continuous Control Policies by Stochastic Value Gradients
4.Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models
-
Action-Conditional Video Prediction using Deep Networks in Atari Games
-
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
二十:DRL应用
机器人领域:
-
Trust Region Policy Optimization
-
Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control
-
Path Integral Guided Policy Search
-
Memory-based control with recurrent neural networks
-
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection
-
Learning Deep Neural Network Policies with Continuous Memory States
-
High-Dimensional Continuous Control Using Generalized Advantage Estimation
-
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
-
End-to-End Training of Deep Visuomotor Policies
-
DeepMPC: Learning Deep Latent Features for Model Predictive Control
-
Deep Visual Foresight for Planning Robot Motion
-
Deep Reinforcement Learning for Robotic Manipulation
-
Continuous Deep Q-Learning with Model-based Acceleration
-
Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
-
Asynchronous Methods for Deep Reinforcement Learning
-
Learning Continuous Control Policies by Stochastic Value Gradients
机器翻译:
- Simultaneous Machine Translation using Deep Reinforcement Learning
目标定位:
- Active Object Localization with Deep Reinforcement Learning
目标驱动的视觉导航:
- Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning
自动调控参数:
- Using Deep Q-Learning to Control Optimization Hyperparameters
人机对话:
-
Deep Reinforcement Learning for Dialogue Generation
-
SimpleDS: A Simple Deep Reinforcement Learning Dialogue System
-
Strategic Dialogue Management via Deep Reinforcement Learning
-
Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning
视频预测:
- Action-Conditional Video Prediction using Deep Networks in Atari Games
文本到语音:
- WaveNet: A Generative Model for Raw Audio
文本生成:
- Generating Text with Deep Reinforcement Learning
文本游戏:
- Language Understanding for Text-based Games Using Deep Reinforcement Learning
无线电操控和信号监控:
- Deep Reinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL Agent
DRL来学习做物理实验:
- LEARNING TO PERFORM PHYSICS EXPERIMENTS VIA DEEP REINFORCEMENT LEARNING(11.13更新)
DRL加速收敛:
- Deep Reinforcement Learning for Accelerating the Convergence Rate(11.14更新)
利用DRL来设计神经网络:
-
Designing Neural Network Architectures using Reinforcement Learning(11.14更新)
-
Tuning Recurrent Neural Networks with Reinforcement Learning(11.14更新)
-
Neural Architecture Search with Reinforcement Learning(11.14更新)
控制信号灯:
- Using a Deep Reinforcement Learning Agent for Traffic Signal Control(11.14更新)
二十一:其它方向
避免危险状态:
- Combating Deep Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear (11.14更新)
DRL中On-Policy vs. Off-Policy 比较:
- On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning(11.14更新)