accuracy-score bagofwords confusion-matrix lemmatization naive-bayes-classifier nlp preprocessing stemming tokenization multinomial-naive-bayes numpy-arrays pickle tf-idf-vectorizer wordclo

Spam_Classifier_Project

A spam classifier using Natural Language Processing (NLP) is a machine learning model designed to automatically categorize and filter out unwanted or irrelevant messages, typically in the context of emails or text messages. It analyzes the content of messages and applies NLP techniques to distinguish between legitimate and spam messages based on various features, such as the presence of specific keywords, patterns, or text characteristics.

Introduction

This program is designed to classify SMS messages into two categories: spam and ham. It processes the text messages using various techniques, such as data cleaning, preprocessing, and the Bag of Words model. The Naive Bayes classifier is used for making the final classification decision.

Getting Started

These instructions will help you get a copy of the project up and running on your local machine for testing and development purposes.

Prerequisites

Before you begin, ensure you have met the following requirements:

Having an understanding of Lemmatization, Stemming, Stop words, Bag of words, Naive Bayes Classifier
Dataset Link (https://archive.ics.uci.edu/dataset/228/sms+spam+collection)
Python (>=3.0)
Python libraries: pandas, nltk, sklearn

You can install the required libraries using pip:

pip install pandas nltk scikit-learn

Code Description

The code is structured into several main sections:

Importing the Dataset: Reads the SMS dataset using Pandas.
Data Cleaning and Preprocessing: Cleans and preprocesses the text data, including removing non-alphabetic characters, converting to lowercase, and applying stemming and stopword removal.
Creating the Bag of Words Model: Utilizes the CountVectorizer from scikit-learn to convert the text data into numerical features.
Train-Test Split: Splits the dataset into a training set and a testing set for model evaluation.
Training the Naive Bayes Classifier: Utilizes a Multinomial Naive Bayes classifier to train the spam detection model.
valuating the Model: Calculates and displays the confusion matrix and accuracy score for model performance evaluation.
Creating the Streamlit App: Generates the app.py file using the Streamlit library to create a user-friendly web application for spam detection.

Usage

You can use this code as a starting point for SMS spam classification. To use the program, follow these steps:

Install the prerequisites.
Ensure you have a dataset with SMS messages and labels.
Modify the file path to your dataset in the code.

Run the code to train and evaluate the SMS spam classifier.

Screenshots

Dataset Frame
After Data Cleaning
Difference between Actual DataSet And Cleaning Dataset
Bag Of Words(X)
Tf-idf(X)
(Y) Array of 0's & 1's of label('spam''ham')
X(independent variable) & y(dependent variable)
Confusion Matrix
Accuracy Score
Final Outcome

Packages And Libraries

pandas
re
nltk
Scikit-learn
Streamlit
pickle

Author

This Model is developed by Ayush Verma.

About

A spam classifier is a software or machine learning model that categorizes incoming messages or content as either "spam" (unwanted or irrelevant) or "ham" (legitimate or relevant), using automated techniques.