dukleryoni / 285J_Twitter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Move "Raw Data" folder into 285J_Twitter/ directory
Run format_data_pandas.py to create a single file with all text data in Python format
Run preprocessor_clean.py to perform TF-IDF and NMF on text data, dumping W and H from NMF into a tuple like (W, H) in a Python pickle file
Run generate_topics.py to print out a list of the top words in each topic



Stop words list for Spanish, Catalan, and English from: http://www.ranks.nl/stopwords

About


Languages

Language:Python 100.0%