raphelemmanuvel / ml-spam-classification

Classic ML starter project to classify messages from a sms dataset into spam and ham

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ML- SMS Spam/Ham Clustering

A Machine Learning classic starter project using Python libraries to cluster a data set of 'sms' messages into 'spam' and 'ham' using k-means.

The dataset is a collection of 5,574 SMS meesages taken from UCI Machine Learning repository, need to be tagged as "spam" and "ham".

The whole pipeline conists of the following steps:

  • Loading data
  • Data wrangling and pre-processing
  • Feature Selection
  • Feature Vector Modelling
  • k-means clustering and evaluation
  • Writing results

Although there are multiple methods for solving the problem, tfidf approach is employed here to obtain high prediction accuracy.

About

Classic ML starter project to classify messages from a sms dataset into spam and ham


Languages

Language:Python 100.0%