bact / socialmedia

Play with data from social media

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

get-stream.py

A script to collect tweets from Twitter stream

  • This is done by using Twitter's Stream API to get tweets with a language specified, filtered with 400 commons words in that language

Commons words

Plan

  • Check if it's a retweet or not
    • If it is a retweet, does it has an additional text (check id_str)
  • Have to get the full tweet, no truncation
    • Check truncated=True
  • Keep these attributes: id, retweet_count, favorite_count, retweeted_status(id, favorite_count, retweet_count)

make-train-data.py

Convert tweets in JSON format to __label__X text text tex text format as required by fastText.

About

Play with data from social media

License:GNU General Public License v3.0


Languages

Language:Python 100.0%