This repo includes mining Twitter data for understanding shopping habits of people.
- A Twitter Scraper is implemented in order to connect twitter stream API and MongoDB through python. The script to run the scraper is
python twitter_stream.py -71.191261,42.227654,-70.804482,42.39698 Boston_db
The first argument is the bounding box which you can specify and the second argument is the mongo database name that you want to store the scraped data.
- With the scraped Twitter data, the first type of analysis is pulling data directly from Mongodb then analyze some direct statistics, such as word frequency, the most popular hashtags for a specific location. Related work can be found in the notebook twitter_from_mongo.ipynb.
- The second type of analysis includes the twitter mentioning brand names in our brand list. Examples include category-based popularity, brand sentiment score, can be found in twitter_brand_analysis.ipynb
- The third type of analysis is based on brands' followers. I use LDA model to extract trending topics from brands' followers' twitter timeline. This work is inspired by https://github.com/peimengsui/datatalks. Sample analysis and visualizaiton can be found in twitter_follower_topic.ipynb
- The last type of analysis is twitter_coocurance.ipynb, analyzing coocurrence of words for specific subset of twitter data. This work can be applied to analyze twitter for specific shopping event.