nlp sentiment-analysis regression sklearn

SentimentAnalysis

This is a small NLP project which uses sentiment analysis and machine learning to classify words with positive or negative connotations.

This project is a part of the final project for the Getting started with NLP PES IO course.

Started: Mar 2020 Finished: Mar 2020

Description:

This program reads in filtered data from the IMBD movie data set. The data set has been divided into test and train. It cleans the data and begins to process it. The model used in this project is linear regression. The first half of the data contains positive reviews and the second half contains negative reviews. Based on this information, the model is trained to recognize positive and negative words. Using this trained model, the program can assign a numerical value to the tone of a user's input.

Steps:

STEP 1. Open the Test and train files and read them into an array

STEP 2. Clean the data by removing all unnecessary symbols and punctuation.

STEP 3. Neutralize the data by feeding it through a count vectorizer function

STEP 4. Split the data and train it based on negative and positve words

STEP 5. Traning the model using Logistic regression

STEP 7. Store the trained data in a dictionary with the word being the key

STEP 8. Accepting user input and determining a tone scale

What I have learned:

During this project I picked up a whole bunch of new skills. They are as follows:

Data filtering
Count Vectorization
SkLearn Library
Regex
Logistic Regression

Note: I have used serveral sources from the web to build this project. As well as major thanks to my IO mentor.

About

This is a small NLP project which uses sentiment analysis and machine learning to classify words with positive or negative connotations.

nlp sentiment-analysis regression sklearn

Languages

Language:Python 100.0%