Sentiment Analysis using different classification methods

This is the final project for the Machine Learning course. In this project, I run sentiment analysis on this dataset using different classification methods. First, I will talk about the data preprocessing and word vectorization methods, and then I will talk about the classification methods that I used.

The project notebook can be found here. Documentation is available here.

Results are available in the notebook and the documentation.

Data preprocessing

The following preprocessing tasks are done on the data:

Low casing words
Delete additional spaces
Remove stopwords
Remove punctuation marks
Word Lemmatization: Using the nltk library
Word Tokenization

After that, I leveraged the undersampling method to balance classes.

Word embedding

I used the following methods to embed words:

TFIDF Vectorization: Using the sklearn library
CBOW
Skip-Gram

Classification methods

The following classification methods are used:

Logistic Regression
Gaussian Naive Bayes
Random Forest
Adaboost
Support Vector Machine (SVM)
Neural Net (MLP)

About

This repository contains the code for the final project of the Machine Learning course taught by Dr. Abolfazl Motahari in the Spring semester of 2023 at Sharif University of Technolog

Languages

Language:Jupyter Notebook 100.0%