Neo-Zenith / text-message-sentiment-analyzer

A mini-project on emotion classification using NLP for the course SC1015: Introduction to Data Science & Artificial Intelligence.

Home Page:https://text-sentiment-analysis.streamlit.app/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Text Message Sentiment Analyser

Preface

Objective(s)

In this highly-digitalised day and age, texting has become the preferred way of communication for the current generation. However, texting has indirectly impacted the art of communicating - through the negligence of emotion. Consequently, text messages can often be misinterpreted, depending on the perspectives of the perceiver and sender.

The main objective of this project is to utilise the knowledge we learnt in elementary data science and machine learning to build a simple application bsaed on the following key factors:

  • To predict the emotion of a text message at a reasonable accuracy.
  • To provide the predicted probability of each emotion from the given sentence to account for cases where a multitude of emotions are present.

In addition, we figured a few potential routes to take our simple application further into development in the future:

  • Sentimental analysis of customer review on online products.
  • Sentimental analysis of IMDB ratings of movies.
  • Online dating profile matching algorithm fine-tuning based on the general perception of emotion from a conversation.

Skills Learnt

  • Perform Exploratory Data Analysis on unstructured data (texts) using Word Cloud.
  • Concepts about Recall, Precision & F1-score.
  • Logistic Regression, Linear Support Vector Machine & Naive Bayes Algorithm implementation in Machine Learning.
  • Implementation of Cross-Validation Check.
  • Implementation of an application's graphical user interface using Streamlit.
  • Elementary Object-Oriented Programming during the standardization of functions & classes.
  • Introduction to documentation writing.
  • Collaboration using GitHub.

Dataset

Source of Dataset

https://www.kaggle.com/praveengovi/emotions-dataset-for-nlp by Praveen

Format of Dataset

text emotion
i didnt feel humiliated sadness
i can go from feeling so hopeless to so damned hopeful just from being around... sadness
im grabbing a minute to post i feel greedy wrong anger
i am ever feeling nostalgic about the fireplace i will know that it is still... love

Note: text and emotion are separated by a semi-colon ';'.

i didnt feel humiliated;sadness
i am feeling grouchy;anger
...

Contributors

Lee Juin (Alias: @Neo-Zenith)

Kassim bin Mohamad Malaysia (Alias: @kassimmalaysia)

Lee Ci Hui (Alias: @perfectsquare123)

Default Libraries

The following libraries are used throughout the project.

Note: Word Cloud has not received any official support for Python 3.8x and above. Thus, we used Word Cloud unofficial as our library instead. For Python 3.7x and below, please refer to Word Cloud. However, do note that our project is ran and tested on Python 3.8x and above.

Custom Libraries

We have compiled a list of functions and classes which are useful during our project. These functions are repeatedly used within our project, and can be found in Libraries.

Please read Libaries Information for the details of the functions and classes found within our custom library.

Miscellaneous

Issues

[FIXED] Issue on Jupyter Notebook (Ipynb files) and Github

There appears to be a widespread issue ongoing on Github w.r.t the incorrect printing/inability to print outputs from Jupyter Notebook formatted files.

Replicable: Yes
Source of Issue: Most likely Github
Fixed: Yes
Comments: Please use an alternative IDE to inspect the main code sections. Visual Studio Code is known to be working properly.

Issue on the display of Jupyter Notebook (Ipynb files) on Github

In certain scenario, clicking into our Jupyter Notebook will not render the notebook completely, or there is a tiny scrollable box which displays the notebook itself. While it is possible to read the entire notebook this way, it is highly inconvenient and certain visualisation will not be seen in its entirety.

Replicable: Yes
Source of Issue: Most likely due to the large file size of our notebook.
Fixed: No
Comments: Please refresh the notebook if the aforementioned error occurs. Otherwise, please use an alternative IDE to inspect the main code sections. Visual Studio Code is known to be working properly.

Run-through

Overview

Our code section is divided into 3 main portion:

Data Preparation

In this section, we perform the necessary import of libraries, as well as our train dataset. We also performed simple analysis of our dataset to get a brief outlook of what kind of data we were dealing with.

Please refer to Text-Message Sentiment Analyser for the details of our source code.

Exploratory Data Analysis

In this section, we perform mainly more in-depth analysis of our dataset. From the analysis, we figured out that our dataset requires some cleaning. Thus, we have performed dataset cleaning which can be classified into the following 3 phases:

  • Lemmatization of words
  • Removal of HTML tags and attributes
  • Removal of stopwords

We are mainly using the NLTK library as our de-facto dataset cleaning library.

We are mainly using the Word Cloud as our main data visualisation library.

Please refer to Text-Message Sentiment Analyser under Exploratory Data Analysis for the details of our source code.

Machine Learning

In this section, we perform machine learning by using the following 3 models on our train dataset:

  • Logisitc Regression
  • Naive Bayes Algorithm
  • Linear Support Vector Machine

We proceeded to apply our trained models on the validation dataset, and obtain their respective Precision, Recall and F1-socre.

We further performed a repeated k-fold cross validation check on each model to determine the best model from the three.

Finally we apply the best model we chose on the test dataset.

Please refer to Text-Message Sentiment Analyser under Machine Learning for the details of our source code.

Acknowledgements

Special thanks to our Teaching Assistant, Ms. Song Nan, for providing some valuable feedbacks and suggestions throughout the project.

Reference

Below are some links that we have used as references throughout the project:

About

A mini-project on emotion classification using NLP for the course SC1015: Introduction to Data Science & Artificial Intelligence.

https://text-sentiment-analysis.streamlit.app/


Languages

Language:Jupyter Notebook 99.9%Language:Python 0.1%