codylieu / nlp_challenge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NLP Coding Challenge

Run Instructions

Follow the steps below to run:

  1. Clone this repository
  2. Execute sudo easy_install pip if you don't already have pip installed.
  3. Execute pip install virtualenv to install virtual env
  4. Execute virtualenv .
  5. Execute source bin/activate to activate the virtual environment
  6. Execute pip install -r requirements.txt to install the project modules
  7. Execute python predict_review_sentiment.py to start model generation and sentiment prediction cli interface

The Code

predict_review_sentiment.py first splits the original data attached in the challenge according to a ratio. Changing this ratio will overwrite previous files.

It then instantiates a classifier and instantiates ModelGenerator with the classifier. You can change the classifier used to play aroung with the interface.

After generating the model, saving it, fitting the classifier, and scoring it, the program prompts you for a string input that represents a movie review, which it will then predict the sentiment of.

To turn the string into a vector, SentenceVectorizer is used, which implements a naive transformation that loses information about ordering and local context.

A constants.py file was created since I imagined this could evolve as a pipeline producing data that the model is trained on. Live or batched review data could then be fed in for classification. Therefore it's reasonable to assume this can all live on some service that will write intermediate data to a local filesystem.

ModelGenerator.py works by reading in the data generated by TrainTestDataSplitter.py and turning them into gensim's LabeledSentence class before feeding it into the Doc2Vec model. We then reference these vectors using previously created tags to assemble our training/testing vectors/labels, so we can fit the input classifier and score it.

References:

  1. https://www.tensorflow.org/tutorials/word2vec
  2. https://github.com/linanqiu/word2vec-sentiments
  3. https://ahmedbesbes.com/sentiment-analysis-on-twitter-using-word2vec-and-keras.html
  4. https://radimrehurek.com/gensim/index.html
  5. https://stackoverflow.com/questions/30795944/how-can-a-sentence-or-a-document-be-converted-to-a-vector

About


Languages

Language:Python 100.0%