roanakb / Contextual-Thesaurus

Contextual Thesaurus with Facebook Infersent Embeddings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Contextual-Thesaurus

Demo Video

Description

Uses Facebook Infersent embeddings to calculate which synset is correct for the given word and sentence. Synsets are acquired using WordNet.

Sources

Facebook Research Paper for Infersent Embeddings
Repository for Infersent Embeddings

Known Limitations

This method requires the context to be very explicit in the sentence, as can be seen in the example sentences below. With further experimentation of word embeddings and classification methods, this may be improved.

Setup

Dependencies

Required Natural Language Tool Kit downloads:

nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')

Infersent Embedding Setup:

First, download GloVe embeddings as follows (~20 min):
mkdir 'GloVe'
curl -Lo 'GloVe/glove.840B.300d.zip' http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip 'GloVe/glove.840B.300d.zip' -d 'GloVe/'

Then, download Facebook Infersent Encoder as follows (~5 min):
mkdir encoder
curl -Lo encoder/infersent1.pkl https://dl.fbaipublicfiles.com/infersent/infersent1.pkl

Examples from Demo

sentence = "I tightened the bolt to make sure it didn't fall apart"
word = "bolt"

sentence = "The fast guy ran by in a bolt"
word = "bolt"

sentence = "The bolt during the thunderstorm shocked me"
word = "bolt"

sentence = "The current was too strong to swim against"
word = "current"

sentence = "The high current on the wire shocked me"
word = "current"

About

Contextual Thesaurus with Facebook Infersent Embeddings


Languages

Language:Python 97.2%Language:HTML 2.8%