frason88 / Health-News-Twitter-Data-Clustering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Health-News-Twitter-Data-Clustering

• Do ⭐ the repository if it helped you in anyway.

Abstract:

The data was collected in 2015 using Twitter API. This dataset contains health news from more than 15 major health news agencies such as BBC, CNN, and NYT.

Data Set Information:

Each file is related to one Twitter account of a news agency. For example, bbchealth.txt is related to BBC health news. Each line contains tweet id|date and time|tweet. The separator is '|'. This text data has been used to evaluate the performance of topic models on short text data. However, it can be used for other tasks such as clustering.

Output

SSE = 3462.671 cluster 1: 1511 tweets cluster 2: 809 tweets cluster 3: 1609 tweets

  • K = 5

SSE = 335.843 cluster 1: 1526 tweets cluster 2: 579 tweets cluster 3: 720 tweets cluster 4: 758 tweets cluster 5: 346 tweets

  • K = 7

SSE = 3239.233 cluster 1: 605 tweets cluster 2: 452 tweets cluster 3: 675 tweets cluster 4: 536 tweets cluster 5: 762 tweets cluster 6: 510 tweets cluster 7: 389 tweets

About


Languages

Language:Jupyter Notebook 100.0%