Twitter-Health-News-Mining

Work for a data mining class on health news data.

Dataset

Combine datasets into single csv
Clean dataset of links, hashtags, cashtags, etc. (this is accomplished but bugged)
Remove punctuation & convert words to lowercase (same as above)
Determine how many tokens are in the dataset
Embed tweets into vectors (or graphs????)
Cluster tweet embeddings (Half done, just needs a more formal workflow)
Cluster tweets on sentiment
Track trends through time of each cluster
Identify qualities of each cluster (top-k words, semantic themes, sentiment, etc.)

Work for a data mining class on health news data

MIT License

Language:Jupyter Notebook 98.2%Language:Python 1.7%Language:Makefile 0.0%