tweet_processor

Overview

This is a toy project that

This project is meant as a way to learn how to use

AWS EC2 setup:

Python scripting:

Twitter / tweet tracking:

Other AWS services:

tweet_basketball.py and tweet_baseball.py:

Connect to Tweeter API using bot tokens.
Track tweet streams with specific terms (basketball and baseball for each script).
Extract created_at and id_str field from each tweet record.
Store the tweet record to /tmp/<term>.log

Kinesis Firehose:

Agent on stream source side is configured by /etc/aws-kinesis/agent.json.
Configure Kinesis agent to monitor the two /tmp/<term>.log files.
Send new records to the corresponding delivery streams.
Each delivery stream is then configured to
- Store the records to S3 first.
- Then use COPY command to copy the records to Redshift.

Redshift:

count_tweets.py:

Query the database for "number of all tweets that are at most 10 minutes older than the newest tweet", for each table.
Store the result to memcache.

simple_http.py:

A toy project that track and query tweet streams using AWS Kinesis firehose and Redshift

Language:Python 82.3%Language:HTML 17.7%