Backend for a Chrome Extension that will filter social media content based on sentiment and users' choice filter words. This repository contains the models and scripts used to create the models. The extension itself can be found here. The API repository can be found here.
initial_modeling.py - Script for the sentiment modeling
preprocessing_script.py - Script for preprocessing the tweets, word filtering takes place here too
topic_modeling.py - Script for topic modeling (Unfinished - not used in final extension)
Capstone_Glove_Word_Embeddings.ipynb - Exploration of GloVE word vectors for classification
Capstone_Fasttext_Word_Embeddings.ipynb - Exploration of fasttext word vectors for classification
Tweet_Extraction_from_UMich_Tweets.ipynb - Extracting test set from random sample of tweets provided by UMSI
Capstone_LinearSVC_Model_Eval.ipynb - Evaluate LinearSVC model on test set derived from UMSI tweet sample
bigram_tweet_df.csv - Cut dataset containing tweets
training.1600000.processed.noemoticon.csv - Cut dataset containing tokenized tweets
thesaurus.json - The original thesaurus file
thesaurus_lem.json - The lemmatized thesaurus file
sampled_tweets.csv - Tweets sampled from UMSI tweets using VADER algorithm for preliminary sentiment labeling
cleaned_human_responses.csv - Modified version of sampled_tweets with human labeling added
LinearSVCModel.sav - Linear SVC Model Pickle
MNBModel.sav - Multinomial Naive Bayes Model Pickle
phrasemodel.sav - Phrase Model Pickle
SGDModel.sav - Stochastic Gradient Descent Model Pickle
Images - Confusion Matrices and Accuracy vs Model Size Graph