mHamzaArain / SpamClassifier

Spam classification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spam Classifier

Spam definition Spam is any kind of unwanted, unsolicited digital communication that gets sent out in bulk. Often spam is sent via email, but it can also be distributed via text messages, phone calls, or social media.

What It Does:


How It Does:

Extract the text and the target class from the dataset. Extract the features of the test using TF-IDF vectorizer for the Input features.Split the skewed data into shuffled sets using stratified shuffle split in sklearn library. Use standard classifiers to classify the data into spam or ham.


Dataset:

The SMS/Email Spam Collection is a set of SMS tagged messages that have been collected for SMS/Email Spam research. It contains one set of SMS messages in English of 5,567 messages, tagged according being ham (legitimate) or spam.

You can collect raw dataset from here.

The files contain one message per line. Each line is composed by two columns:

  • Class- contains the label (ham or spam)
  • Message - contains the raw text.

About

Spam classification


Languages

Language:Jupyter Notebook 100.0%