Problem description: https://github.com/NLP-Projects/Word-Similarity
- Download the pretrained embedding at
word2vec/link
and store atword2vec/W2V_150.txt
- The
cosine_similarity()
is implemented using this formula:
-
read_pretrain_embeds
read the embed file then store it into a binary file usingpickle
for quicker reading. This function only have to run once -
test()
test the function using the Pearson correlation and Spearman rank correlation
- Pearson: 0.43698374385268623
- Spearman: 0.3964172917677552