(LeNet) LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998).
(AlexNet) Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. (2012).
(ZFNet) Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." European conference on computer vision. Springer, Cham, (2014).
(NIN) Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." (2013). [arXiv:1312.4400]
(VGGNet) Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition."(2014). [arXiv:1409.1556]
(GoogLeNet) Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
(BN) Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International Conference on Machine Learning. (2015). [arXiv:1502.03167]
(ResNet) He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. (2016). [arXiv:1512.03385] [CVPR 2016 Best Paper] ⭐
(Pre-active) He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer International Publishing. (2016). [arXiv:1603.05027]
Huang, Gao, et al. "Deep networks with stochastic depth." European Conference on Computer Vision. Springer, Cham, 2016. [arXiv:1603.09382]
(ResNeXt) Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, (2017). [arXiv:1611.05431]
Keskar, Nitish Shirish, and Richard Socher. "Improving Generalization Performance by Switching from Adam to SGD." arXiv preprint (2017). [arXiv:1712.07628]
Loshchilov, Ilya, and Frank Hutter. "SGDR: stochastic gradient descent with restarts." arXiv preprint (2016). [arXiv:1608.03983] ⭐
Smith, Leslie N. "Cyclical learning rates for training neural networks." Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on. IEEE, 2017. [arXiv:1506.01186]
Huang, Gao, et al. "Snapshot ensembles: Train 1, get M for free." arXiv preprint (2017). [arXiv:1704.00109]
Jaderberg, Max, et al. "Population based training of neural networks." arXiv preprint (2017). [arXiv:1711.09846]
Generative Adversarial Network
Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. (2014). [arXiv:1406.2661]
Mirza, Mehdi, and Simon Osindero. "Conditional generative adversarial nets." (2014). [arXiv:1411.1784]
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." (2015). [arXiv:1511.06434]
Reed, Scott, et al. "Generative adversarial text to image synthesis." (2016). [arXiv:1605.05396]
Shrivastava, Ashish, et al. "Learning from simulated and unsupervised images through adversarial training."(2016). [arXiv:1612.07828]
Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." (2013). [arXiv:1312.5602]
Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning."(2015). [Nature 518.7540] ⭐
Other improvements:
(DDQN) Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep Reinforcement Learning with Double Q-Learning." AAAI. 2016. [arXiv:1509.06461]
Schaul, Tom, et al. "Prioritized experience replay."(2015). [arXiv:1511.05952]
Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." (2015). [arXiv:1511.06581] [ICML2016 Best Paper]
Actor-Critic
(DDPG) Lillicrap, Timothy P., et al. "Continuous control with deep reinforcement learning." (2015). [arXiv:1509.02971]
(A3C) Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." ICML (2016). [arXiv:1602.01783] ⭐
(ACER) Wang, Ziyu, et al. "Sample efficient actor-critic with experience replay." (2016). [arXiv:1611.01224]
(ACKTR) Wu, Yuhuai, et al. "Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation." Advances in Neural Information Processing Systems. (2017). [arXiv:1708.05144]
More
(UNREAL) Jaderberg, Max, et al. "Reinforcement learning with unsupervised auxiliary tasks." (2016). [arXiv:1611.05397]
(TRPO) Schulman, John, et al. "Trust region policy optimization." Proceedings of the 32nd International Conference on Machine Learning (ICML-15). (2015). [arXiv:1502.05477]
Heess, Nicolas, et al. "Emergence of locomotion behaviours in rich environments." (2017). [arXiv:1707.02286]
Hessel, Matteo, et al. "Rainbow: Combining Improvements in Deep Reinforcement Learning." (2017). [arXiv:1710.02298]
Andrychowicz, Marcin, et al. "Learning to learn by gradient descent by gradient descent." Advances in Neural Information Processing Systems. (2016). [arXiv:1606.04474]
(GAIL)Ho, Jonathan, and Stefano Ermon. "Generative adversarial imitation learning." Advances in Neural Information Processing Systems. (2016). [arXiv:1606.03476]
(InfoGAIL)Li, Yunzhu, Jiaming Song, and Stefano Ermon. "InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations." Advances in Neural Information Processing Systems. (2017). [arXiv:1703.08840]
Lample, Guillaume, and Devendra Singh Chaplot. "Playing FPS Games with Deep Reinforcement Learning." AAAI. (2017). [arXiv:1609.05521]
O'Donoghue, Brendan, et al. "Combining policy gradient and Q-learning." (2016). [arXiv:1611.01626]
Merel, Josh, et al. "Learning human behaviors from motion capture by adversarial imitation." (2017). [arXiv:1707.02201]
Liu, YuXuan, et al. "Imitation from observation: Learning to imitate behaviors from raw video via context translation." (2017). [arXiv:1707.03374]
Hester, Todd, et al. "Deep Q-learning from Demonstrations." Proceedings of the Conference on Artificial Intelligence (AAAI). (2018). [arXiv:1704.03732]
Computer Games
2048 Like Games
Szubert, Marcin, and Wojciech Jaśkowski. "Temporal difference learning of n-tuple networks for the game 2048." Computational Intelligence and Games (CIG), IEEE Conference on. IEEE, (2014).
Wu, I-Chen, et al. "Multi-stage temporal difference learning for 2048." Technologies and Applications of Artificial Intelligence. Springer, Cham, (2014).
Yeh, Kun-Hao, et al. "Multi-stage temporal difference learning for 2048-like games." IEEE Transactions on Computational Intelligence and AI in Games (2016).
Jaskowski, Wojciech. "Mastering 2048 with Delayed Temporal Coherence Learning, Multi-Stage Weight Promotion, Redundant Encoding and Carousel Shaping." IEEE Transactions on Computational Intelligence and AI in Games (2017). ⭐
(UCB)Auer, Peter, Nicolo Cesa-Bianchi, and Paul Fischer. "Finite-time analysis of the multiarmed bandit problem." Machine learning 47.2-3 (2002): 235-256.
(UCT)Kocsis, Levente, and Csaba Szepesvári. "Bandit based monte-carlo planning." European conference on machine learning. Springer, Berlin, Heidelberg, 2006.
(MCTS)Coulom, Rémi. "Efficient selectivity and backup operators in Monte-Carlo tree search." International conference on computers and games. Springer, Berlin, Heidelberg, 2006.
(RAVE)Gelly, Sylvain, and David Silver. "Monte-Carlo tree search and rapid action value estimation in computer Go." Artificial Intelligence 175.11 (2011): 1856-1875.
Gelly and David. "Combining online and offline knowledge in UCT." ICML 2007.
ICML 2017: Test of Time Award
Chaslot, Guillaume MJ-B. "Parallel monte-carlo tree search." International Conference on Computers and Games. Springer, Berlin, Heidelberg, (2008).
Segal, Richard B. "On the scalability of parallel UCT." International Conference on Computers and Games. Springer, Berlin, Heidelberg, (2010).
Browne, Cameron B., et al. "A survey of monte carlo tree search methods." IEEE Transactions on Computational Intelligence and AI in games 4.1 (2012): 1-43.
AlphaGo
Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489. ⭐
APV-MCTS
Silver, David, et al. "Mastering the game of go without human knowledge." Nature 550.7676 (2017): 354. ⭐
Silver, David, et al. "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm." (2017). [arXiv:1712.01815] ⭐
Silver, David, et al. "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play." Science 362.6419 (2018): 1140-1144.
More
Silver, David, Richard S. Sutton, and Martin Müller. "Temporal-difference search in computer Go." Machine learning 87.2 (2012): 183-219.
Lai, Matthew. "Giraffe: Using deep reinforcement learning to play chess." (2015). [arXiv:1509.01549]
Vinyals, Oriol, et al. "StarCraft II: a new challenge for reinforcement learning." (2017). [arXiv:1708.04782]
Maddison, Chris J., et al. "Move evaluation in go using deep convolutional neural networks." (2014). [arXiv:1412.6564]
Soeda, Shunsuke, Tomoyuki Kaneko. "Dual lambda search and shogi endgames." Advances in Computer Games. Springer, Berlin, Heidelberg, (2005).
(Darkforest)Tian, Yuandong, and Yan Zhu. "Better computer go player with neural network and long-term prediction." arXiv:1511.06410 (2015).
Cazenave, Tristan. "Residual networks for computer Go." IEEE Transactions on Games 10.1 (2018): 107-110.
Gao, Chao, Martin Müller, and Ryan Hayward. "Three-Head Neural Network Architecture for Monte Carlo Tree Search." IJCAI. (2018).
(ELF)Tian, Yuandong, et al. "Elf: An extensive, lightweight and flexible research platform for real-time strategy games." Advances in Neural Information Processing Systems. (2017).
(ELF2)Tian, Yuandong, et al. "ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero." arXiv:1902.04522 (2019).
Others
Li, Yuxi. "Deep reinforcement learning: An overview." (2017). arXiv:1701.07274