CS224n_exercise

Stanford 2019 CS224n:Natural Language Processing with Deep Learning: course link

Schedule

Due Date	Assignment	Done
1/15	Assignment1: Exploring Word Vector	V
1/22	Assignment2: Word2vec	V
1/29	Assignment3: Dependency Parsing	V
2/7	Assignment4: Neural Machine Traslation with RNNs	V
2/22	Assignment5: NMT system with Convolution encoder and LSTM decoder (requires Stanford login)	--

Issue

(NOTE) 2019/2/9: PASS all test in sanity_check.py (1d, 1e, 1f) , not finish the VM section in GPU.

(CLOSE) 2019/1/20: TA just update the test result of this example.

The DIFFERENCE is: if you test the results before running on Stanford Sentiment Treebank as I did, you will get LOSS= 16.15119285363322; once you run python run.py and turn back running python word2vec.py, you will get LOSS = 14.3018669327. Be careful!

--Original topic--

In assignment2/word2vec.py, my result of loss, gradCenterVec, gradOutsideVecs using Skip-Gram with negSamplingLossAndGradient, not close to the expected value as TA given, can anyone give me some advice? Is there any tiny problem that I ignored? Thanks.

Skip-Gram with negSamplingLossAndGradient
Your Result:
Loss: 16.15119285363322
Gradient wrt Center Vectors (dJ/dV):
 [[ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [-4.54650789 -1.85942252  0.76397441]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]]
 Gradient wrt Outside Vectors (dJ/dU):
 [[-0.69148188  0.31730185  2.41364029]
 [-0.22716495  0.10423969  0.79292674]
 [-0.45528438  0.20891737  1.58918512]
 [-0.31602611  0.14501561  1.10309954]
 [-0.80620296  0.36994417  2.81407799]]

Expected Result: Value should approximate these:
Loss: 14.3018669327
Gradient wrt Center Vectors (dJ/dV):
 [[ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [-3.86035429 -2.8660339  -0.9739887 ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]]
 Gradient wrt Outside Vectors (dJ/dU):
 [[-0.30559455  0.14022886  1.06668785]
 [-0.12708467  0.05831563  0.44359323]
 [-0.45528438  0.20891737  1.58918512]
 [-0.73739425  0.33836976  2.57389893]
 [-0.64496237  0.29595533  2.25126239]]

Furthermore, I search the concept of negative sampling in NLP and I am considering of the problem as following: the way to calculate loss using the positive samples (bcz they used the 1 - negative probability) , make sense or not? Yes

youngmihuang / cs224n_exercise

CS224n_exercise

Schedule

Issue

About

Languages