Deep-Learning-Paper-List

Excellent Slides

Deep Reinforcement Learning David Silver. ICML 2016 Tutorial.
Memory Networks for Language Understanding Jason Weston. ICML 2016 Tutorial.
Deep Residual Networks: Deep Learning Gets Way Deeper Kaiming He. ICML 2016 Tutorial.
Recent Advances in Non-Convex Optimization Anima Anandkumar. ICML 2016 Tutorial.

Architecture

Recurrent Neural Networks(RNNs)

Long short-term memory LSTM S. Hochreiter and J. Schmidhuber.
On the properties of neural machine translation: Encoder-decoder approaches GRU Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio.
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling LSTM v.s. GRU Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, Yoshua Bengio.
Training RNNs as Fast as CNNs **SRU **Tao Lei, Yu Zhang
Optimizing performance of recurrent neural networks on gpus cuDNN LSTM Jeremy Appleyard, Tomas Kocisky, Phil Blunsom.

Convolutional Neural Networks(CNNs)

Deep Residual Learning for Image Recognition Kaiming. He, Xiangyu. Zhang, Shaoqing. Ren, Jian. Sun. CVPR, 2016.
Identity Mappings in Deep Residual Networks Kaiming. He, Xiangyu. Zhang, Shaoqing. Ren, Jian. Sun. ECCV, 2016.
Deep Residual Networks: Deep Learning Gets Way Deeper Kaiming He. ICML Tutorial, 2016.
Residual Networks Behave Like Ensembles of Relatively Shallow Networks Andreas. Veit, Micheal. Wilber, Serge. Belongie. 2016.
Very deep convolutional networks for large-scale image recognition K. Simonyan and A. Zisserman. 2014.

Loss & Optimization

Global and Local Minima, Saddle Points

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. S. Keskar, D. Mudidere, J. Nocedal, M. Smelyanskiy, P. T. P. Tang.
The Loss Surfaces of Multilayer Networks A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, Y. LeCun.
Escaping from Saddle Points - Online Stochastic Gradients for Tensor Decomposition. Rong Ge, Furong Huang, Chi Jin, Yang Yuan.
Deep Learning without Poor Local Minima K. Kawaguchi.

Optimization Algorithm

rmsprop: Divide the gradient by a running average of its recent magnitude G. Hinton, Nitish Srivastava, Kevin Swersky.
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Duchi, E. Hazan, Y. Singer. JMLR, 2011.
Adadelta - an adaptive learning rate method Matthew D. Zeiler
Adam - A Method for Stochastic Optimization Diederik P. Kingma, Jimmy Ba.
Incorporating Nesterov Momentum into Adam Timothy Dozat.
Large Scale Distributed Deep Networks, Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Andrew Y. Ng. NIPS, 2012.
Asynchronous Stochastic Gradient Descent with Delay Compensation for Distributed Deep Learning, Shuxin Zheng, Qi Meng, Taifeng Wang, Wei Chen, Nenghai Yu, Zhi-Ming Ma, Tie-Yan Liu.

Gradient Vanishing & Activations

Others

Batch normalization: Accelerating deep network training by reducing internal covariate shift S. Loffe and C. Szegedy.

Regularization

Dropout: A simple way to prevent neural networks from overfitting N. Srivastava et al.
Improving neural networks by preventing co-adaptation of feature detectors G. Hinton et al.

Applications

Natural Language Processing

Question Answering

Memory Networks. J. Weston, S. Chopra, A. Bordes arXiv, 2015.
End-to-End Memory Networks S. SukhbaatarM A. Szlam, J. Weston, R. Fergus. arXiv, 2015.
Iterative Alternating Neural Attention for Machine Reading A. Sordoni, P. Bachman, Y. Bengio. arXiv, 2016.
Key-Value Memory Networks for Directly Reading Documents A. Miller, A. Fisch, J. Dodge, A. Karimi, A. Bordes, J. Weston. arXiv, 2016.
Question Answering with Subgraph Embedding A. Bordes, J. Weston, S. Chopra. arXiv, 2014

Image Classification

ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. NIPS.
Deep Residual Learning for Image Recognition Kaiming. He, Xiangyu. Zhang, Shaoqing. Ren, Jian. Sun. CVPR, 2016.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
OverFeat: Integrated recognition, localization and detection using convolutional networks P. Sermanet et al.

Knowledge Graph

Translating Embeddings for Modeling Multi-relational Data A. Bordes, N. Usunier, A. Garcia-Duran, Jason Weston, O. Yakhnenko. NIPS, 2013.
Knowledge Graph Embedding by Translating on Hyperplanes Zhen Wang, Jianwen Zhang, Jianlin Feng, Zheng Chen. AAAI, 2014.
Learning Entity and Relation Embeddings for Knowledge Graph Completion Y. Lin, Zhuyuan Liu, Maosong Sun, Yang Liu, Xuan Zhu. AAAI, 2015.
Knowledge Graph Embedding via Dynamic Mapping Matrix G. Ji, Shizhu He, Liheng Xu, Kang Liu, Jun Zhao. ACL, 2015.
Reasoning With Neural Tensor Networks for Knowledge Base Completion R. Socher, Danqi Chen, C. Manning, A. Ng. NIPS, 2013.

Generative Adversarial Networks(GAN)

Generative adversarial nets GAN I. Goodfellow et al.
Wasserstein GAN WGAN Martin Arjovsky, Soumith Chintala, Léon Bottou
Conditional Generative Adversarial Nets code Mehdi Mirza, Simon Osindero
Conditional Image Synthesis With Auxiliary Classifier GANs Augustus Odena, Christopher Olah, Jonathon Shlens
Improved techniques for training GANs T. Salimans et al.
Semi-supervised Conditional GANs Kumar Sricharan, Raja Bala, Matthew Shreve, Hui Ding, Kumar Saketh, Jin Sun.
Disguise Adversarial Networks for Click-through Rate Prediction Yue Deng, Yilin Shen, Hongxia Jin.

Deep Reinforcement Learning

Human-level control through deep reinforcement learning V. Mnih et al.
Playing atari with deep reinforcement learning V. Mnih et al.
Asynchronous methods for deep reinforcement learning V. Mnih et al.
Deep Reinforcement Learning with Double Q-Learning H. Hasselt et al.
Mastering the game of Go with deep neural networks and tree search D. Silver et al.

Existing Problems

Intriguing properties of neural networks Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus.

Generalization

Understanding Deep Learning Requires Rethinking Generalization Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals.
mixup: Beyond Empirical Risk Minimization. Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz.

junwei-pan / Deep-Learning-Paper-List

Deep-Learning-Paper-List

Excellent Slides

Architecture

Recurrent Neural Networks(RNNs)

Convolutional Neural Networks(CNNs)

Loss & Optimization

Global and Local Minima, Saddle Points

Optimization Algorithm

Gradient Vanishing & Activations

Others

Regularization

Applications

Natural Language Processing

Embeddings

Text Classification

Machine Translation

Question Answering

Other Topics

Image Classification

Knowledge Graph

Generative Adversarial Networks(GAN)

Deep Reinforcement Learning

Existing Problems

Generalization

TensorFlow_Learning

About