Deep Learning for NLP resources

Introductory and state of the art resources for NLP sequence modeling tasks like dialog.

##Machine Learning: Neural Networks, RNN, LSTM Coursera: Machine Learning
Andrew Ng
Introductory course for linear regression, logistic regression, and neural networks.
Also covers support vector machines, k-means, etc.

Machine Learning for Developers
Intro to basic ML concepts for developers.

Deep Learning (Book)
Yoshua Bengio
Advanced book about deep learning.

A few useful things to know about machine learning
Pedro Domingos

Understanding LSTM Networks
Blog post by Chris Olah.

Deep Learning for NLP

Stanford Natural Language Processing
Intro NLP course with videos. This has no deep learning. But it is a good primer for traditional nlp.

Stanford CS 224D: Deep Learning for NLP class
Richard Socher. (2016) Class with syllabus, and slides.
Videos: [2015 lectures] (https://www.youtube.com/channel/UCsGC3XXF1ThHwtDo18d7WVw/videos) / [2016 lectures] (https://www.youtube.com/playlist?list=PLcGUo322oqu9n4i0X3cRJgKyVy7OkDdoi)

A Primer on Neural Network Models for Natural Language Processing
Yoav Goldberg. October 2015. No new info, 75 page summary of state of the art.

Word Vectors

Resources about word vectors, aka word embeddings, and distributed representations for words.
Word vectors are numeric representations of words where similar words have similar vectors. Word vectors are often used as input to deep learning systems. This process is sometimes called pretraining.

A neural probabilistic language model.
Bengio 2003. Seminal paper on word vectors.

Efficient Estimation of Word Representations in Vector Space
Mikolov et al. 2013. Word2Vec generates word vectors in an unsupervised way by attempting to predict words from a corpus. Describes Continuous Bag-of-Words (CBOW) and Continuous Skip-gram models for learning word vectors.
Skip-gram takes center word and predict outside words. Skip-gram is better for large datasets.
CBOW - takes outside words and predict the center word. CBOW is better for smaller datasets.
[Distributed Representations of Words and Phrases and their Compositionality] (http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)
Mikolov et al. 2013. Learns vectors for phrases such as "New York Times." Includes optimizations for skip-gram: heirachical softmax, and negative sampling. Subsampling frequent words. (i.e. frequent words like "the" are skipped periodically to speed things up and improve vector for less frequently used words)
Linguistic Regularities in Continuous Space Word Representations
Mikolov et al. 2013. Performs well on word similarity and analogy task. Expands on famous example: King – Man + Woman = Queen
Word2Vec source code
Word2Vec tutorial in TensorFlow

word2vec Parameter Learning Explained
Rong 2014

Deep Learning, NLP, and Representations
Chris Olah (2014) Blog post explaining word2vec.

GloVe: Global vectors for word representation
Pennington, Socher, Manning. 2014. Creates word vectors and relates word2vec to matrix factorizations. Evalutaion section led to controversy by Yoav Goldberg
Glove source code and training data

Thought Vectors

Thought vectors are numeric representations for sentences, paragraphs, and documents. This concept is used for many text classification tasks such as sentiment analysis.

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Socher et al. 2013. Introduces Recursive Neural Tensor Network. Uses a parse tree.

Distributed Representations of Sentences and Documents
Le, Mikolov. 2014. Introduces Paragraph Vector. Concatenates and averages pretrained, fixed word vectors to create vectors for sentences, paragraphs and documents. Also known as paragraph2vec. Doesn't use a parse tree.
Implemented in gensim. See doc2vec tutorial

##Machine Translation Neural Machine Translation by jointly learning to align and translate
Bahdanau, Cho 2014. "comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation." Implements attention mechanism.
English to French Demo

Sequence to Sequence Learning with Neural Networks
Sutskever, Vinyals, Le 2014. (nips presentation). Uses LSTM RNNs to generate translations. " Our main result is that on an English to French translation task from the WMT’14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8"
seq2seq tutorial in TensorFlow.

##Single Exchange Dialog A Neural Network Approach toContext-Sensitive Generation of Conversational Responses
Sordoni 2015. Generates responses to tweets.
Uses Recurrent Neural Network Language Model (RLM) architecture of (Mikolov et al., 2010). source code: RNNLM Toolkit

Neural Responding Machine for Short-Text Conversation
Shang et al. 2015 Uses Neural Responding Machine. Trained on Weibo dataset. Achieves one round conversations with 75% appropriate responses.

A Neural Conversation Model
Vinyals, Le 2015. Uses LSTM RNNs to generate conversational responses. Uses seq2seq framework. Seq2Seq was originally designed for machine transation and it "translates" a single sentence, up to around 79 words, to a single sentence response, and has no memory of previous dialog exchanges. Used in Google Smart Reply feature for Inbox

##Memory and Attention Models Attention mechanisms allows the network to refer back to the input sequence, instead of forcing it to encode all information into one fixed-length vector. - Attention and Memory in Deep Learning and NLP

Memory Networks Weston et. al 2014, and End-To-End Memory Networks Sukhbaatar et. al 2015.
Memory networks are implemented in MemNN. Attempts to solve task of reason attention and memory.
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
Weston 2015. Classifies QA tasks like single factoid, yes/no etc. Extends memory networks.
Evaluating prerequisite qualities for learning end to end dialog systems
Dodge et. al 2015. Tests Memory Networks on 4 tasks including reddit dialog task.
See Jason Weston lecture on MemNN

Neural Turing Machines
Graves et al. 2014.

Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
Joulin, Mikolov 2015. Stack RNN source code and blog post

Reasoning, Attention and Memory RAM workshop at NIPS 2015. slides included

eden57 / DL4NLP

Deep Learning for NLP resources

Deep Learning for NLP

Word Vectors

Thought Vectors

About