theablemo / ML-Final-Project-Sentiment-Analysis

This repository contains the code for the final project of the Machine Learning course taught by Dr. Abolfazl Motahari in the Spring semester of 2023 at Sharif University of Technolog

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sentiment Analysis using different classification methods

This is the final project for the Machine Learning course. In this project, I run sentiment analysis on this dataset using different classification methods. First, I will talk about the data preprocessing and word vectorization methods, and then I will talk about the classification methods that I used.

The project notebook can be found here. Documentation is available here.

Results are available in the notebook and the documentation.

Data preprocessing

The following preprocessing tasks are done on the data:

  • Low casing words
  • Delete additional spaces
  • Remove stopwords
  • Remove punctuation marks
  • Word Lemmatization: Using the nltk library
  • Word Tokenization

After that, I leveraged the undersampling method to balance classes.

Word embedding

I used the following methods to embed words:

  1. TFIDF Vectorization: Using the sklearn library
  2. CBOW
  3. Skip-Gram

Classification methods

The following classification methods are used:

  • Logistic Regression
  • Gaussian Naive Bayes
  • Random Forest
  • Adaboost
  • Support Vector Machine (SVM)
  • Neural Net (MLP)

About

This repository contains the code for the final project of the Machine Learning course taught by Dr. Abolfazl Motahari in the Spring semester of 2023 at Sharif University of Technolog


Languages

Language:Jupyter Notebook 100.0%