This is a small NLP project which uses sentiment analysis and machine learning to classify words with positive or negative connotations.
This project is a part of the final project for the Getting started with NLP PES IO course.
Started: Mar 2020 Finished: Mar 2020
This program reads in filtered data from the IMBD movie data set. The data set has been divided into test and train. It cleans the data and begins to process it. The model used in this project is linear regression. The first half of the data contains positive reviews and the second half contains negative reviews. Based on this information, the model is trained to recognize positive and negative words. Using this trained model, the program can assign a numerical value to the tone of a user's input.
STEP 1. Open the Test and train files and read them into an array
STEP 2. Clean the data by removing all unnecessary symbols and punctuation.
STEP 3. Neutralize the data by feeding it through a count vectorizer function
STEP 4. Split the data and train it based on negative and positve words
STEP 5. Traning the model using Logistic regression
STEP 7. Store the trained data in a dictionary with the word being the key
STEP 8. Accepting user input and determining a tone scale
During this project I picked up a whole bunch of new skills. They are as follows:
- Data filtering
- Count Vectorization
- SkLearn Library
- Regex
- Logistic Regression
Note: I have used serveral sources from the web to build this project. As well as major thanks to my IO mentor.