Digestant

Dev Environment

Recommended to create a new virtual environment to manage your python project.
Download python packages from requirements.txt: $ pip install -r requirements.txt.
Download NLTK data: $ python -m nltk.downloader all.
Download SpaCy en_core_web_md model: $ python -m spacy download en_core_web_md.
Download stanford-ner-xxxx-xx-xx zip file Stanford NER model
1. Download from the official website.
2. Unzip and place the stanford-ner-xxxx-xx-xx folder the project root path. The name of folder should also be stanford-ner/.

Create a twitter and reddit account, follow the accounts that you are interested in.
Copy config-sample.json and rename it to config.json in the same directory. Remember to fill the keys in config.json. (Go to your twitter/reddit developer console, create application and get keys.)
We need to crawl twitter data, so run the script crawlers/twitter_crawler.py. It will automatically crawl data and save them to dataset/twitter/ by default.
You can customize data entities by modifying domains.json and types.json. (See demo)
Currently, you can execute demo/demo_howard.ipynb or other notebooks to see daily digest.

Modules for effectively digesting data from Twitter and Reddit using ML, NLP and statistics.

Language:Jupyter Notebook 96.8%Language:Python 3.2%