Thenx0009 / Spam_Classifier_Project

A spam classifier is a software or machine learning model that categorizes incoming messages or content as either "spam" (unwanted or irrelevant) or "ham" (legitimate or relevant), using automated techniques.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spam_Classifier_Project

A spam classifier using Natural Language Processing (NLP) is a machine learning model designed to automatically categorize and filter out unwanted or irrelevant messages, typically in the context of emails or text messages. It analyzes the content of messages and applies NLP techniques to distinguish between legitimate and spam messages based on various features, such as the presence of specific keywords, patterns, or text characteristics.

Table of Contents

Introduction

This program is designed to classify SMS messages into two categories: spam and ham. It processes the text messages using various techniques, such as data cleaning, preprocessing, and the Bag of Words model. The Naive Bayes classifier is used for making the final classification decision.

Getting Started

These instructions will help you get a copy of the project up and running on your local machine for testing and development purposes.

Prerequisites

Before you begin, ensure you have met the following requirements:

You can install the required libraries using pip:

pip install pandas nltk scikit-learn

Code Description

The code is structured into several main sections:

  1. Importing the Dataset: Reads the SMS dataset using Pandas.

  2. Data Cleaning and Preprocessing: Cleans and preprocesses the text data, including removing non-alphabetic characters, converting to lowercase, and applying stemming and stopword removal.

  3. Creating the Bag of Words Model: Utilizes the CountVectorizer from scikit-learn to convert the text data into numerical features.

  4. Train-Test Split: Splits the dataset into a training set and a testing set for model evaluation.

  5. Training the Naive Bayes Classifier: Utilizes a Multinomial Naive Bayes classifier to train the spam detection model.

  6. valuating the Model: Calculates and displays the confusion matrix and accuracy score for model performance evaluation.

  7. Creating the Streamlit App: Generates the app.py file using the Streamlit library to create a user-friendly web application for spam detection.

Usage

You can use this code as a starting point for SMS spam classification. To use the program, follow these steps:

  1. Install the prerequisites.
  2. Ensure you have a dataset with SMS messages and labels.
  3. Modify the file path to your dataset in the code.

Run the code to train and evaluate the SMS spam classifier.

Screenshots

  1. Dataset Frame Alt text

  2. After Data Cleaning Alt text

  3. Difference between Actual DataSet And Cleaning Dataset Alt text

  4. Bag Of Words(X) Alt text

  5. Tf-idf(X)

    Alt text

  6. (Y) Array of 0's & 1's of label('spam''ham')

    Alt text

  7. X(independent variable) & y(dependent variable) Alt text

  8. Confusion Matrix Alt text

  9. Accuracy Score Alt text

  10. Final Outcome

    Alt text

Packages And Libraries

  • pandas
  • re
  • nltk
  • Scikit-learn
  • Streamlit
  • pickle

Author

This Model is developed by Ayush Verma.

About

A spam classifier is a software or machine learning model that categorizes incoming messages or content as either "spam" (unwanted or irrelevant) or "ham" (legitimate or relevant), using automated techniques.


Languages

Language:Jupyter Notebook 99.5%Language:Python 0.5%