binary-classification logestic-regression machine-learning python scikit-learn spam-detection

SMS Spam Classification using Logistic Regression

This repository presents an AI model that can classify SMS messages as either spam or legitimate (ham) with an impressive accuracy of 95%. We employ techniques such as TF-IDF (Term Frequency-Inverse Document Frequency) and Logistic Regression as the classifier to identify spam messages from a given dataset.

Dataset

The dataset used for this project can be found at the following URL: SMS Spam Collection Dataset.

The dataset contains two main columns:

v1: Contains labels for each SMS message, indicating whether it is "ham" (legitimate) or "spam."
v2: Contains the actual SMS message text.

Colab Notebook

For a detailed implementation and analysis of the SMS spam classification model using Logistic Regression, please refer to the Colab notebook available here: Colab Notebook.

Model and Techniques

We have developed our SMS spam classification model using Logistic Regression. Logistic Regression is well-suited for binary classification tasks like spam detection, where the goal is to classify messages as either "spam" or "ham."

How It Works

Data Preprocessing: We preprocess the SMS message data, including text cleaning and tokenization.
Feature Extraction: We use TF-IDF (Term Frequency-Inverse Document Frequency) to convert the text data into numerical features that can be used by the machine learning model.
Model Training: We train a Logistic Regression classifier using the preprocessed and transformed data.
Model Evaluation: The model achieved an impressive accuracy of 95%, making it highly reliable in classifying SMS messages correctly.

Model Accuracy

The Logistic Regression model achieved an accuracy score of 95% on the SMS spam classification task. This high accuracy demonstrates the effectiveness of the model in identifying spam messages.

About

AI model that can classify SMS messages as spam or legitimate. Use techniques like TF-IDF or word embeddings with Logestic Regression

binary-classification logestic-regression machine-learning python scikit-learn spam-detection

Languages

Language:Jupyter Notebook 100.0%