Natural Language Processing Classification and Clustering
Project Context
The is a SMS Spam related dataset. It is a public set of SMS labeled messages that have been collected for mobile phone spam research. The classification goal is to predict whether the message is a spam or ham message.
This dataset is downloaded from https://archive.ics.uci.edu/ml/datasets/sms+spam+collection and you can download it here in csv format as well.
Project Introduction
The Classification and Clustering in Natural Language Processing (NLP) will be applied. Our target is to predict email types (ham or spam) and divide similar sms keywords into numbers of groups.
Methodologies
LinearSVC and TfidfVectorizer (Classification)
K-Means-Clustering (Clustering) -> In Progress
Creator
Jerry Ng (City University of Hong Kong)