atharvajk98/UCI-Sentiment-Analysis

bidirectional-lstm convolutional-neural-networks deep-learning feedforward-neural-network linearsvc lstm-neural-networks machine-learning multinomial-naive-bayes random-forest-classifier sentiment-analysis tfidf-vectorizer uci-machine-learning wordembeddings

Implementation of Machine Learning and Deep Learning for Sentiment Analysis

Sentiment analysis is the process of determining whether a piece of writing is positive, negative or neutral.
In this project I have demondtrated how various Machine Learning and Deep Learning models can be used for sentiment analysis.

The Dataset:

The dataset used is "Sentiment Labelled Sentences Dataset", from the UC Irvine Machine Learning Repository.
The sentences come from three different websites/fields:
- amazon.com
- imdb.com
- yelp.com
Each sentence is labelled as either 1 (for positive) or 0 (for negative).
For each website,tThere exist 500 positive and 500 negative sentences.
This dataset was created for the Paper 'From Group to Individual Labels using Deep Features', Kotzias et. al,. KDD 2015. (Please cite the paper if you want to use it :))
Link to the dataset is: Sentiment Labelled Sentences Data Set
The dataset is present in the Dataset folder.

Machine Learning models:

I have used the follwoing Machine Learning models:

Multinomial Naive bayes
Random Forest
LinearSVC

The code implementing these models is in 'modules/Sentiment_Analysis_ML.ipynb'.
All the trained models are stored at 'models/ML'. Thereafter the models are segrated as per the dataset (Amazon, IMDB, Yelp).

Deep Learning models:

I have used the follwoing Deep Learning models:

Feed Forward Neural Network (FFNN)
Convolutional Neural Network (CNN)
Recurrent Neural Network (LSTM)

As the dataset consists of three different set of data, I have created three different implementations for each of them.

Amazon product Rreview Dataset ('modules/Amazon_Sentiment_Analysis_DL.ipynb')
IMDB Movie Review Dataset ('modules/IMDB_Sentiment_Analysis_DL.ipynb')
Yelp Restuarant Review Dataset ('modules/Yelp_Sentiment_Analysis_DL.ipynb')

All the trained models are stored at 'models/DL'. Thereafter the models are segrated as per the dataset (Amazon, IMDB, Yelp).

Word Embeddings:

All the Deep Learning architectures use the GloVe Word Embeddings.
To download click here (please download them before running the code.)
The 6 Billion words, 100 dimensional vector representation variant is used.
The have been stored at location 'Dataset/GloVe_Word_Embeddings'

Results:

After tyring various machine learning and deep learning models, I got the following results.

Model	Amazon Reviews	IMDB Reviews	Yelp Reviews
Multinomial Naive Bayes	85%	85%	78%
Random Forest	80%	79%	79%
Linear SVC	84%	81.50%	80%
FFNN	81.50%	84%	82%
CNN	87%	85.50%	82.50%
LSTM	87%	85%	83%

About

Implementation of various Machine Learning and Deep Learning models for Sentiment Analysis on the 'Sentiment Labelled Sentences Data Set' by University of California, Irvine.

bidirectional-lstm convolutional-neural-networks deep-learning feedforward-neural-network linearsvc lstm-neural-networks machine-learning multinomial-naive-bayes random-forest-classifier sentiment-analysis tfidf-vectorizer uci-machine-learning wordembeddings

MIT License

Languages

Language:Jupyter Notebook 98.9%Language:Python 1.1%