dd2912/ml_papers

1 Introduction to Deep Learning

Text Book

Bengio, Yoshua, Ian J. Goodfellow, and Aaron Courville. Deep learning. An MIT Press book. (2015). pdf

High-level Survey

LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature 521.7553 (2015): 436-444.pdf ️️️️️

Courses

MIT 6.S191Introduction to Deep Learning web
Dive into Deep Learning web

2 Convolutional Neural Networks (CNNs)

LeNet: Image Classification on Handwritten Digits and Image Classification on ImageNet

Y. LeCun, L. Bottou, Y. Bengio and P. Haffner. Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11):2278-2324. 1998. pdf (Seminal Paper: LeNet)
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. 2012. pdf
Simonyan, Karen, and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014). pdf
Szegedy, Christian, et al. Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. pdf
He, Kaiming, et al. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015). pdf ResNet
Huang, G. et al. Densely Connected Convolutional Networks. arXiv preprint arXiv:1608.06993 (2017) pdf (DenseNet)
Hu, Jie et al. Squeeze-and-Excitation Networks. arXiv preprint arXiv:1709.01507 (2017) pdf
Howard, A. G. et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. pdf]
Tan, M. and Le, Q. V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. pdf
Xie, Q. et al. Self-training with Noisy Student improves ImageNet classification. pdf
Bojarski, M. et al. End to End Learning for Self-Driving Cars. pdf

3 Object Detection

H. A. Rowley, S. Baluja, and T. Kanade, Neural network-based face detection, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition, pp. 203–208, 1996. pdf
P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pdf
Szegedy, Christian, Alexander Toshev, and Dumitru Erhan. Deep neural networks for object detection. Advances in Neural Information Processing Systems. 2013. pdf
Girshick, Ross, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. pdf RCNN
He, Kaiming, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. European Conference on Computer Vision. Springer International Publishing, 2014. pdf SPPNet
Girshick, Ross. Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision. 2015. pdf️️️️
Ren, Shaoqing, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in neural information processing systems. 2015. pdf ️️️️
Redmon, Joseph, et al. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640 (2015).pdf
Liu, Wei, et al. SSD: Single Shot MultiBox Detector. arXiv preprint arXiv:1512.02325 (2015). pdf
Dai, Jifeng, et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks. arXiv preprint arXiv:1605.06409 (2016).pdf
K. He et al. Mask R-CNN arXiv preprint arXiv:1703.06870 (2017). pdf
Tsung-Yi Lin et al. Feature Pyramid Networks for Object Detection. arXiv:1612.03144 (2017). pdf
Esteban Real, Alok Aggarwal, Yanping Huang: Regularized Evolution for Image Classifier Architecture Search, 2018; arXiv:1802.01548 pdf
Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang: NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection, 2019; arXiv:1904.07392 pdf
Chenchen Zhu, Yihui He: Feature Selective Anchor-Free Module for Single-Shot Object Detection, 2019; arXiv:1903.00621 pdf
Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Xinyu Xiao: DetNAS: Backbone Search for Object Detection, 2019; arXiv:1903.10979 pdf
Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang: CenterNet: Keypoint Triplets for Object Detection, 2019; arXiv:1904.08189 pdf
Mingxing Tan, Ruoming Pang: EfficientDet: Scalable and Efficient Object Detection, 2019; arXiv:1911.09070 pdf

4 Object Segmentation and Self-Supervised Learning

Segmentation:

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation. in CVPR, 2015. pdf
O. Ronnenberger et al. U-Net: Convolutional Networks for Biomedical Image Segmentation. 2015. pdf
Multi-Scale Context Aggregation by Dilated Convolutions. 2016. pdf
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. 2016. pdf
Rethinking Atrous Convolution for Semantic Image Segmentation. 2017. pdf
K. He et al. Mask R-CNN arXiv preprint arXiv:1703.06870. 2017. pdf
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. 2018. pdf
Learning to Segment Everything. 2018. pdf

Self-Supervised Learning:

Unsupervised Visual Representation Learning by Context Prediction. 2015. pdf
Colorful Image Colorization. 2016. pdf
Representation Learning by Learning to Count. 2017. pdf
Learning and Using the Arrow of Time. 2018. pdf
Tracking Emerges by Colorizing Videos. 2018. pdf
Audio-Visual Scene Analysis with Self-Supervised Multi-sensory Features. 2018. pdf
Object Discovery with a Copy-Pasting GAN. 2019. pdf
SimCLR: A Simple Framework for Contrastive Learning of Representations. 2020. pdf

5 Generative Adversarial Networks and Applications

Generative Adversarial Networks:

Kingma, D, and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013). pdf
Goodfellow, Ian, et al. Generative adversarial nets. 2014. pdf
Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016). pdf
Makzhani, Alireza, et al. Adversarial Autoencoders arXiv:1511.05644 (2015). pdf
Gregor, Karol, et al. DRAW: A recurrent neural network for image generation. arXiv:1502.04623 (2015). pdf

Applications:

Wasserstein GAN. 2017. pdf
Large Scale GAN Training for High Fidelity Natural Image Synthesis. 2018. pdf
A Style-based Generator Architecture for Generative Adversarial Networks 2018. pdf
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks 2017. pdf
Conditional LSTM-GAN for Melody Generation from Lyrics. 2019. pdf
GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction. 2019. pdf

Art:

Mordvintsev, Alexander; Olah, Christopher; Tyka, Mike (2015). Inceptionism: Going Deeper into Neural Networks. Google Research. html
Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015). pdf
CAN: Creative Adversarial Networks 2017. pdf
Semantic Image Synthesis with Spatially-Adaptive Normalization 2019. pdf
Deep Poetry: Word-Level and Char-Level Language Models for Shakespearean Sonnet Generation pdf
BachProp: Learning to Compose Music in Multiple Styles 2018. pdf
A 'New' Rembrandt: From the Frontiers of AI And Not The Artist's Atelier 2016. html
Is artificial intelligence set to become art’s next medium? 2018. html
AI Will Enhance - Not End - Human Art 2019. html
An AI-Written Novella Almost Won a Literary Prize 2016. html
How AI-Generated Music Is Changing The Way Hits Are Made 2018.html
AI puts final notes on Beethoven's Tenth Symphony 2019. html

Previous Papers

Zhu, Jun-Yan, et al. Generative Visual Manipulation on the Natural Image Manifold. European Conference on Computer Vision. Springer International Publishing, 2016. pdf
Champandard, Alex J. Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks. arXiv preprint arXiv:1603.01768 (2016). pdf
Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. arXiv preprint arXiv:1603.08155 (2016). pdf ️️️️
Vincent Dumoulin, Jonathon Shlens and Manjunath Kudlur. A learned representation for artistic style. arXiv preprint arXiv:1610.07629 (2016). pdf ️️️️
Gatys, Leon and Ecker, et al.Controlling Perceptual Factors in Neural Style Transfer. arXiv preprint arXiv:1611.07865 (2016). pdf
Ulyanov, Dmitry and Lebedev, Vadim, et al. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images. arXiv preprint arXiv:1603.03417(2016). pdf

6 RNN / Sequence-to-Sequence Model

Bengio, Yoshua et. al. A Neural Probabilistic Model JMLR (2003). pdf
Graves, Alex. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013). (LSTM, very nice generating result, show the power of RNN) pdf
Mikolov, et al. Distributed representations of words and phrases and their compositionality. NIPS(2013) pdf
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. Advances in neural information processing systems. 2014.pdf
Bahdanau, Dzmitry, KyungHyun Cho, and Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. (2014). pdf

7 NLP (Natural Language Processing)

Ashish Vaswani, et al. Attention is All you Need. NIPS (2017) pdf
Matthew Peters, et al. Deep Contexualized Word Representations. pdf
Jeremy Howard, et al. Universal Language Model Fine-Tuning for Text Classification ACL (2018) pdf
4. Jacob Devlin, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2019) pdf
5. Victor Sanh, et al. DistilBERT, a distilled version of BERT. arXiv preprint arXiv:1910.01108(2019) pdf

8 Machine Translation

Lee, et al. Fully Character-Level Neural Machine Translation without Explicit Segmentation. (2016) pdf
Wu, Schuster, Chen, Le, et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. pdf
Jonas Gehring, et al. Convolutional Sequence to Sequence Learning. (2017). pdf
Lample, et al. Phrase-Based & Neural Unsupervised Machine Translation. (2018) pdf
Ye Jia, et al. Direct Speech-to-Speech Translation with a Sequence-to-Sequence Model. (2019). pdf

9 Applications of Sequence-to-Sequence Models

Wen, et al. Recurrent Neural Network Language Generation for Spoken Dialogue Systems. (2019) pdf
Mrksic, et al. Multi-domain Dialog State Tracking using RNNs. (2015) pdf
Srinivasan, et al. Natural Language Generation using Reinforcement Learning with External Rewards. (2019). pdf
Zhu, et al. SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering. (2018) pdf
Xiong, et al. Achieving Human Parity in Conversational Speech Recognition. arXiv:1610.05256 (2016). pdf

10 Reinforcement Learning

Mnih, Volodymyr, et al. Playing atari with deep reinforcement learning. (2013). pdf
Silver, David, et al. Mastering the game of Go with deep neural networks and tree search. (2016) pdf
Silver, David, et al. Mastering the game of Go without Human Knowledge. (2017) pdf
Silver, David, et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. (2017) pdf
OpenAI. Learning Dexterous In-Hand Manipulation. pdf

Previous Papers

Mnih, Volodymyr, et al. Human-level control through deep reinforcement learning. (2015) pdf
Wang, Ziyu, Nando de Freitas, and Marc Lanctot. Dueling network architectures for deep reinforcement learning. (2015). pdf
Mnih, Volodymyr, et al. Asynchronous methods for deep reinforcement learning. (2016). pdf
Lillicrap, Timothy P., et al. Continuous control with deep reinforcement learning. (2015). pdf
Gu, Shixiang, et al. Continuous Deep Q-Learning with Model-based Acceleration. (2016). pdf
Schulman, John, et al. Trust region policy optimization. CoRR, abs/1502.05477 (2015). pdf

11 Unsupervised Learning / Deep Generative Model

Le, Quoc V. Building high-level features using large scale unsupervised learning. pdf
Kingma, Diederik P., and Max Welling. Auto-encoding variational bayes. (2013). pdf
Goodfellow, Ian, et al. Generative adversarial nets. Advances in Neural Information Processing Systems. 2014. pdf
Radford, Alec, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. (2015). pdf
Gregor, Karol, et al. DRAW: A recurrent neural network for image generation. (2015). pdf
Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. (2016). pdf
Oord, Aaron van den, et al. Conditional image generation with PixelCNN decoders. (2016). pdf

12 Image Captioning**

Farhadi,Ali,etal. Every picture tells a story: Generating sentences from images. 2010. pdf ️️️
Kulkarni, Girish, et al. Baby talk: Understanding and generating image descriptions. 2011. pdf️️️️
Vinyals, Oriol, et al. Show and tell: A neural image caption generator. 2014. pdf️️️
Donahue, Jeff, et al. Long-term recurrent convolutional networks for visual recognition and description. pdf
Karpathy, Andrej, and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. 2014. pdf️️️️️
Karpathy, Andrej, Armand Joulin, and Fei Fei F. Li. Deep fragment embeddings for bidirectional image sentence mapping. 2014. pdf️️️️
Fang, Hao, et al. From captions to visual concepts and back. 2014. pdf️️️️️
Chen, Xinlei, and C. Lawrence Zitnick. Learning a recurrent visual representation for image caption generation. 2014. pdf️️️️
Mao, Junhua, et al. Deep captioning with multimodal recurrent neural networks 2014. pdf️️️
Xu, Kelvin, et al. Show, attend and tell: Neural image caption generation with visual attention. 2015. pdf️️️

13 Speech Recognition

Hinton, Geoffrey, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. (2012) pdf
Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. 2013 pdf
Graves, Alex, and Navdeep Jaitly. Towards End-To-End Speech Recognition with Recurrent Neural Networks. 2014 pdf️️️
Sak, Haşim, et al. Fast and accurate recurrent neural network acoustic models for speech recognition. (2015). pdf
Amodei, Dario, et al. Deep speech 2: End-to-end speech recognition in english and mandarin. (2015). pdf
W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, G. Zweig Achieving Human Parity in Conversational Speech Recognition. (2016) pdf

14 Deep Learning Optimization and More

Hinton, Geoffrey E., et al. Improving neural networks by preventing co-adaptation of feature detectors. pdf
Srivastava, Nitish, et al. Dropout: a simple way to prevent neural networks from overfitting. pdf
Ioffe, Sergey, and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. pdf
Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. pdf
Courbariaux, Matthieu, et al. Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to+ 1 or−1. pdf
Jaderberg, Max, et al. Decoupled neural interfaces using synthetic gradients.pdf
Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. Net2net: Accelerating learning via knowledge transfer. pdf
Wei, Tao, et al. Network Morphism. arXiv preprint arXiv:1603.01670 (2016). pdf
Sutskever, Ilya, et al. On the importance of initialization and momentum in deep learning. pdf
Kingma, Diederik, and Jimmy Ba. Adam: A method for stochastic optimization. pdf
Andrychowicz, Marcin, et al. Learning to learn by gradient descent by gradient descent. pdf
Han, Song, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. pdf
Iandola, Forrest N., et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size. pdf ️️️️

15 Robotics

[14.0] Koutník, Jan, et al. Evolving large-scale neural networks for vision-based reinforcement learning. Proceedings of the 15th annual conference on Genetic and evolutionary computation. ACM, 2013. [pdf] ️️️ [14.1] Levine, Sergey, et al. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research 17.39 (2016): 1-40. [pdf] ️️️️️ [14.2] Pinto, Lerrel, and Abhinav Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. arXiv preprint arXiv:1509.06825 (2015). [pdf] ️️️ [14.3] Levine, Sergey, et al. Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection. arXiv preprint arXiv:1603.02199 (2016). [pdf] ️️️️ [14.4] Zhu, Yuke, et al. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning. arXiv preprint arXiv:1609.05143 (2016). [pdf] ️️️️ [14.5] Yahya, Ali, et al. Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search. arXiv preprint arXiv:1610.00673 (2016). [pdf] ️️️️ [14.6] Gu, Shixiang, et al. Deep Reinforcement Learning for Robotic Manipulation. arXiv preprint arXiv:1610.00633 (2016). [pdf] ️️️️ [14.7] A Rusu, M Vecerik, Thomas Rothörl, N Heess, R Pascanu, R Hadsell.Sim-to-Real Robot Learning from Pixels with Progressive Nets. arXiv preprint arXiv:1610.04286 (2016). [pdf] ️️️️ [14.8] Mirowski, Piotr, et al. Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673 (2016). [pdf]️️️️

16 Deep Transfer Learning / Lifelong Learning / especially for RL

[15.0] Bengio, Yoshua. Deep Learning of Representations for Unsupervised and Transfer Learning. ICML Unsupervised and Transfer Learning 27 (2012): 17-36. [pdf] **(**A Tutorial) ️️️ [15.1] Silver, Daniel L., Qiang Yang, and Lianghao Li. Lifelong Machine Learning Systems: Beyond Learning Algorithms. AAAI Spring Symposium: Lifelong Machine Learning. 2013. [pdf] **(**A brief discussion about lifelong learning) ️️️ [15.2] Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015). [pdf] **(**Godfather's Work) ️️️️ [15.3] Rusu, Andrei A., et al. Policy distillation. arXiv preprint arXiv:1511.06295 (2015). [pdf] **(**RL domain) ️️️ [15.4] Parisotto, Emilio, Jimmy Lei Ba, and Ruslan Salakhutdinov. Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342 (2015). [pdf] **(**RL domain) ️️️ [15.5] Rusu, Andrei A., et al. Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016). [pdf] **(**Outstanding Work, A novel idea) ️️️️️

17 One Shot Deep Learning

[16.0] Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Human-level concept learning through probabilistic program induction. Science 350.6266 (2015): 1332-1338. [pdf] **(****No Deep Learning, but worth reading)**️️️️️ [16.1] Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. Siamese Neural Networks for One-shot Image Recognition.(2015) [pdf] ️️️ [16.2] Santoro, Adam, et al. One-shot Learning with Memory-Augmented Neural Networks. arXiv preprint arXiv:1605.06065 (2016). [pdf] **(**A basic step to one shot learning) ️️️️ [16.3] Vinyals, Oriol, et al. Matching Networks for One Shot Learning. arXiv preprint arXiv:1606.04080 (2016). [pdf]️️️ [16.4] Hariharan, Bharath, and Ross Girshick. Low-shot visual object recognition. arXiv preprint arXiv:1606.02819 (2016). [pdf] **(**A step to large data) ️️️️

18 Neural Turing Machine

[17.0] Graves, Alex, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401 (2014). [pdf] (Basic Prototype of Future Computer) ️️️️️ [17.1] Zaremba, Wojciech, and Ilya Sutskever. Reinforcement learning neural Turing machines. arXiv preprint arXiv:1505.00521 362 (2015). [pdf] ️️️ [17.2] Weston, Jason, Sumit Chopra, and Antoine Bordes. Memory networks. arXiv preprint arXiv:1410.3916 (2014). [pdf]️️️ [17.3] Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. End-to-end memory networks. Advances in neural information processing systems. 2015. [pdf] ️️️️ [17.4] Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. Pointer networks. Advances in Neural Information Processing Systems. 2015. [pdf] ️️️️ [17.5] Graves, Alex, et al. Hybrid computing using a neural network with dynamic external memory. Nature (2016). [pdf] ️️️️️

credit Prof. Peter N Belhumeur

dd2912 / ml_papers