jerrycyng / Natural-Language-Processing-Classification-and-Clustering

NLP Classification and Clustering with spam SMS dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Natural Language Processing Classification and Clustering

Project Context

The is a SMS Spam related dataset. It is a public set of SMS labeled messages that have been collected for mobile phone spam research. The classification goal is to predict whether the message is a spam or ham message.

This dataset is downloaded from https://archive.ics.uci.edu/ml/datasets/sms+spam+collection and you can download it here in csv format as well.

Project Introduction

The Classification and Clustering in Natural Language Processing (NLP) will be applied. Our target is to predict email types (ham or spam) and divide similar sms keywords into numbers of groups.

Methodologies

LinearSVC and TfidfVectorizer (Classification)

K-Means-Clustering (Clustering) -> In Progress

Creator

Jerry Ng (City University of Hong Kong)

About

NLP Classification and Clustering with spam SMS dataset


Languages

Language:Jupyter Notebook 100.0%