CS224n: Natural Language Processing with Deep Learning (Stanford / Winter 2020)

This Repository Contains Solution to the Practical Assignments of the CS224n (Natural Language Processing with Deep Learning) offered by Stanford University at Winter 2020 Taught by Christopher Manning

Assignments

Assignment 1: Exploring Word Vectors
Assignment 2: Word2Vec from Scratch
Assignment 3: Neural Dependency Parser
Assignment 4: Seq2Seq Machine Translation
Corpus BLEU: 35.8780 es-en translation
Assignment 5: Hybrid Word-Character Seq2Seq Machine Translation
Corpus BLEU: 37.0347 es-en translation

Assignment 1. Word Embeddings

This assignment has two parts which are about representing words with dense vectors. Having these vectors can be useful in downstream tasks in NLP. The first method of deriving word vector stems from the co-occurrence matrices and SVD decomposition. The second method is based on maximum-likelihood training in ML.

1. Count-Based Word Vectors

In this part, you have to use the co-occurrence matrices to develop dense vectors for words. A co-occurrence matrix counts how often different terms co-occur in different documents. To derive a co-occurrence matrix, we use a window with a fixed size w, and then slide this window over all of the documents. Then, we count how many times two different words v_i and v_j occurs with each other in a window, and put this number in the (i, j) entry of the matrix.
Then, we have to run a dimensionality reduction on the co-occurrence matrix using singular value decomposition. We then select the top r components after the decomposition and thus, derive r-dimensional embeddings for words.

2. Prediction(or Maximum Likelihood)-Based Word Vectors: Word2Vec

In this part, you will work with the pre-trained word2vec embeddings of gensim package. There are lots of tasks in this part. At first, you have to reduce the dimensionality of word vectors using SVD from 300 to 2 to be able to visualize the vectors and analyze this visualization. Then you will find the closest word vectors to a given word vector. You will get to know words with several meanings(Polysemous words). You will get to know the analogy task, mentioned for the first time in the original paper of word2vec (Mikolov et al. 2013). The task is simple: given words x, y and z, you have to find a word w such that the following relationship holds: x to y is like z to w. For example, Rome to Italy is like D.C. to the United States. You will find that solving this task with word2vec vectors is easy and is just simple addition and subtraction of vectors, which is a nice feature of word2vec.

Assignment 2. Word2Vec from Scratch

In this assignment, you will get familiar with the word2vec algorithm. The key insight behind word2vec is that "a word is known by the company it keeps". There are two models introduced by the word2vec paper working based on this idea: Skip-gram and Continuous Bag Of Words (CBOW). In this assignment, you have to implement Skip-gram model with Numpy from scratch. You have to implement both versions of Skipgram; the first one is with the naive softmax loss and the second one, which is much faster, is with the negative sampling loss. You have to implement both the forward and backward passes of the two versions of the model from scratch. Your implementation of the first version is just sanity-checked on a small dataset, but you have to run the second version on the Stanford Sentiment Treebank which takes roughly an hour. I highly recommend everyone willing to gain a deep understanding of word2vec to first do the theoretical part of this assignment (available here) and do the practical part afterwards.

Assignment 3. Neural Dependency Parsing

If you have taken a compiler course before, you have heard the term "parsing". This assignment is about "dependency parsing" where you have to train a model that can specify the dependencies. If you remember "Shift-Reduce Parser" from your Compiler class, then you will find the ideas here quite familiar. The only difference is that we use a Neural Network to find the dependencies.

In the theoretical part of the assignment (handout is available here), the Adam optimizer is first introduced and you have to answer some questions about this optimizer. Then, there is a question about Dropout as a regularization technique. Both of Adam optimizer and Dropout will be used in the neural dependency parser you are going to implement with PyTorch.
The parser will do one of the following three moves: 1) Shift 2) Left-arc 3) Right-arc. You can read more about the details of these three moves in the handout of the assignment. What your network should do is to predict one of these moves at every step. For predicting each move, your model needs features which are going to be extracted from the stack and buffer of each stage (there are a stack and a buffer throughout parsing which let you know what you have already parsed and what is remaining for parsing). The good news is that the code for extracting features is given to you to help you just focus on the neural network part! There are lots of hints throughout the assignment --as this is the first assignment in the course where students work with PyTorch-- that walk you through implementing each part.

Assignment 4. Seq2Seq Machine Translation

In my opinion, this assignment is the most important assignment of the course. Generally, you have to implement a Seq2Seq Model that translates Spanish sentences into English. The model that you will implement is based on Luong et al. 2015. You will some important practical notes, such as working with recurrent neural networks in PyTorch, learning the differences between training and test time in RNNs, and implementing attention mechanism, etc. The pipeline and the implementations provided for you are standard and inspired by the Open-NMT package. I highly recommend you to not just implement what is left for you and go further and evaluate carefully what TA's have provided for you, from getting inputs from CLI to evaluation metrics of NMT models and algorithms used for the decoding stage of RNNs such as Beam Search. There are lots of PyTorch techniques and functions that you can grasp and use in your future projects.

Assignment 5. Hybrid Word-Character Seq2Seq Machine Translation

The idea behind this assignment is the same as the previous assignments, except that the model becomes more powerful as we will combine character-level with word-level language modelling. The idea is that whenever the NMT model from assignment 4 generates a <unk> token we do not put it in the output. Instead, we run a character-level language model and generate a word in the output character by character. This hybrid word-character approach was proposed by Luong and Manning 2016 and turned out to be effective in increasing the performance of the NMT model.

sahilkhose / CS224N