EfthymiosChatziathanasiadis / K-means

Clustering Tweets with K-means

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BDP 05: CLUSTERING OF LARGE UNLABELED DATASETS

Source Code

1D Clustering

  • K-Means Clustering on the Ages of the Users dataset.

2D Clustering

  • K-Means Clustering on two features of the Posts dataset (Score and ViewCount)

2D Clustering with Join of Datasets

  • Includes the helper MapReduce used to preprocess the data and join the datasets.
  • K-Means Clustering of the User Ages and Badges Count of the Users and Badges dataset.

Data Preprocessing

  • Helper MapReduce used to normalise tha data points.

Extract Sample From Posts

  • Helper MapReduce used to extract sample data from the Posts dataset for plot purposes.

About

Clustering Tweets with K-means


Languages

Language:Java 100.0%