chiragatha / Tweets-Clustering-KMeans

Implemented K-MEANS algorithm in Python using Jaccard distance as distance metric and analyzed various twitter based applications that involve truth discovery, trend analysis, search ranking.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tweets-Clustering-KMeans

Implemented K-MEANS algorithm in Python using Jaccard distance as distance metric and analyzed various twitter based applications that involve truth discovery, trend analysis, search ranking. k-means clustering algorithm on tweet analysis using Jaccard distance

Programming language used: Python

Files included:

  1. InitialSeeds.txt-contains the initial centroids of the k-means
  2. Output2.txt - sample output
  3. tweet cluster.py - k-means clustering implemented in python
  4. Tweets.json - the boston bombing tweets dataset

Steps to run the code: 1.On the command line go the directory containing the files 2.Type or copy and paste the below command to run the python program on the command line python tweetcluster.py 25 InitialSeeds.txt Tweets.json output.txt

Note:You can change the name of the output file if the output file with the name already exists.

SSE value=16.85524996

About

Implemented K-MEANS algorithm in Python using Jaccard distance as distance metric and analyzed various twitter based applications that involve truth discovery, trend analysis, search ranking.


Languages

Language:Python 100.0%