- Deep Reinforcement Learning David Silver. ICML 2016 Tutorial.
- Memory Networks for Language Understanding Jason Weston. ICML 2016 Tutorial.
- Deep Residual Networks: Deep Learning Gets Way Deeper Kaiming He. ICML 2016 Tutorial.
- Recent Advances in Non-Convex Optimization Anima Anandkumar. ICML 2016 Tutorial.
- Long short-term memory LSTM S. Hochreiter and J. Schmidhuber.
- On the properties of neural machine translation: Encoder-decoder approaches GRU Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio.
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling LSTM v.s. GRU Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, Yoshua Bengio.
- Training RNNs as Fast as CNNs **SRU **Tao Lei, Yu Zhang
- Optimizing performance of recurrent neural networks on gpus cuDNN LSTM Jeremy Appleyard, Tomas Kocisky, Phil Blunsom.
- Deep Residual Learning for Image Recognition Kaiming. He, Xiangyu. Zhang, Shaoqing. Ren, Jian. Sun. CVPR, 2016.
- Identity Mappings in Deep Residual Networks Kaiming. He, Xiangyu. Zhang, Shaoqing. Ren, Jian. Sun. ECCV, 2016.
- Deep Residual Networks: Deep Learning Gets Way Deeper Kaiming He. ICML Tutorial, 2016.
- Residual Networks Behave Like Ensembles of Relatively Shallow Networks Andreas. Veit, Micheal. Wilber, Serge. Belongie. 2016.
- Very deep convolutional networks for large-scale image recognition K. Simonyan and A. Zisserman. 2014.
- On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. S. Keskar, D. Mudidere, J. Nocedal, M. Smelyanskiy, P. T. P. Tang.
- The Loss Surfaces of Multilayer Networks A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, Y. LeCun.
- Escaping from Saddle Points - Online Stochastic Gradients for Tensor Decomposition. Rong Ge, Furong Huang, Chi Jin, Yang Yuan.
- Deep Learning without Poor Local Minima K. Kawaguchi.
- rmsprop: Divide the gradient by a running average of its recent magnitude G. Hinton, Nitish Srivastava, Kevin Swersky.
- Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Duchi, E. Hazan, Y. Singer. JMLR, 2011.
- Adadelta - an adaptive learning rate method Matthew D. Zeiler
- Adam - A Method for Stochastic Optimization Diederik P. Kingma, Jimmy Ba.
- Incorporating Nesterov Momentum into Adam Timothy Dozat.
- Large Scale Distributed Deep Networks, Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Andrew Y. Ng. NIPS, 2012.
- Asynchronous Stochastic Gradient Descent with Delay Compensation for Distributed Deep Learning, Shuxin Zheng, Qi Meng, Taifeng Wang, Wei Chen, Nenghai Yu, Zhi-Ming Ma, Tie-Yan Liu.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift S. Loffe and C. Szegedy.
- Dropout: A simple way to prevent neural networks from overfitting N. Srivastava et al.
- Improving neural networks by preventing co-adaptation of feature detectors G. Hinton et al.
- Efficient Estimation of Word Representations in Vector Space Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
- Distributed Representations of Words and Phrases and their Compositionality T. Mikolov, I. Sutskever, Kai Chen, G. Corrado, J. Dean
- Representations in Vector Space Efficient Estimation of Word Representations in Vector Space T. Mikolov, Kai Chen, G. Corrado, J. Dean
- GloVe: Global Vectors for Word Representation Jeffrey Pennington, Richard Socher, Christopher D. Manning
- Neural Word Embedding as Implicit Matrix Factorization O. Levy, Y. Goldberg.
- Convolutional Neural Networks for Sentence Classificatio Yoon Kim.
- Bag of Tricks for Efficient Text Classification A. Joulin, E. Grave, P. Bojanowski, T. Mikolov.
- Text understanding from scratch Xiang Zhang and Yann LeCun. 2015.
- Character-level convolutional networks for text classification Xiang Zhang, Junbo Zhao, and Yann LeCun. NIPS, 2015.
- Statistical Phrase-based Translation P. Koehn, F. J. Och, D. Marcu. NAACL, 2003.
- On the Properties of Neural Machine Translation: Encoder-Decoder Approaches K. Cho, B. v. Merrienboer, D. Bahdanau, Y. Bengio
- Recurrent Continuous Translation Models N. Kalchbrenner, P. Blunsom. 2013.
- Sequence to Sequence Learning with Neural Networks I. Sutskever, O. Vinyals, Q. V. Le. 2014.
- Neural Machine Translation by Jointly Learning to Align and Translate D. Bahdanau, K. Cho, Y. Bengio, 2014.
- On using very large target vocabulary for neural machine translation S. Jean, K. Cho, R. Memisevic, Y. Bengio. ACL, 2015.
- Addressing the Rare Word Problem in Neural Machine Translation Minh-Thang Luong, I. Sutskever, Q. V. Le, O. Vinyals, W. Zaremba.
- Neural Machine Translation of Rare Words with Subword Units R. Sennrich, B. Haddow, A. Birch.
- Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. CVPR, 2015.
- Memory Networks. J. Weston, S. Chopra, A. Bordes arXiv, 2015.
- End-to-End Memory Networks S. SukhbaatarM A. Szlam, J. Weston, R. Fergus. arXiv, 2015.
- Iterative Alternating Neural Attention for Machine Reading A. Sordoni, P. Bachman, Y. Bengio. arXiv, 2016.
- Key-Value Memory Networks for Directly Reading Documents A. Miller, A. Fisch, J. Dodge, A. Karimi, A. Bordes, J. Weston. arXiv, 2016.
- Question Answering with Subgraph Embedding A. Bordes, J. Weston, S. Chopra. arXiv, 2014
- ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. NIPS.
- Deep Residual Learning for Image Recognition Kaiming. He, Xiangyu. Zhang, Shaoqing. Ren, Jian. Sun. CVPR, 2016.
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
- OverFeat: Integrated recognition, localization and detection using convolutional networks P. Sermanet et al.
- Translating Embeddings for Modeling Multi-relational Data A. Bordes, N. Usunier, A. Garcia-Duran, Jason Weston, O. Yakhnenko. NIPS, 2013.
- Knowledge Graph Embedding by Translating on Hyperplanes Zhen Wang, Jianwen Zhang, Jianlin Feng, Zheng Chen. AAAI, 2014.
- Learning Entity and Relation Embeddings for Knowledge Graph Completion Y. Lin, Zhuyuan Liu, Maosong Sun, Yang Liu, Xuan Zhu. AAAI, 2015.
- Knowledge Graph Embedding via Dynamic Mapping Matrix G. Ji, Shizhu He, Liheng Xu, Kang Liu, Jun Zhao. ACL, 2015.
- Reasoning With Neural Tensor Networks for Knowledge Base Completion R. Socher, Danqi Chen, C. Manning, A. Ng. NIPS, 2013.
- Generative adversarial nets GAN I. Goodfellow et al.
- Wasserstein GAN WGAN Martin Arjovsky, Soumith Chintala, Léon Bottou
- Conditional Generative Adversarial Nets code Mehdi Mirza, Simon Osindero
- Conditional Image Synthesis With Auxiliary Classifier GANs Augustus Odena, Christopher Olah, Jonathon Shlens
- Improved techniques for training GANs T. Salimans et al.
- Semi-supervised Conditional GANs Kumar Sricharan, Raja Bala, Matthew Shreve, Hui Ding, Kumar Saketh, Jin Sun.
- Disguise Adversarial Networks for Click-through Rate Prediction Yue Deng, Yilin Shen, Hongxia Jin.
- Human-level control through deep reinforcement learning V. Mnih et al.
- Playing atari with deep reinforcement learning V. Mnih et al.
- Asynchronous methods for deep reinforcement learning V. Mnih et al.
- Deep Reinforcement Learning with Double Q-Learning H. Hasselt et al.
- Mastering the game of Go with deep neural networks and tree search D. Silver et al.
- Intriguing properties of neural networks Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus.
- Understanding Deep Learning Requires Rethinking Generalization Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals.
- mixup: Beyond Empirical Risk Minimization. Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz.