Twitter Sentiment Analysis Streaming

Summary

Stream tweets for given keywords in realtime and generate sentiment analysis scores for different topics, in realtime.

Usage

cargo run --release -- config.yaml

This application expects a YAML file that contains the following fields:

# change these fields to whatever your Twitter secrets are
access_token: "secret string"
access_token_secret: "secret string"
consumer_key: "secret string"
consumer_secret: "secret string"

# don't have to change this
keywords:
  - "twitter"
  - "facebook"
  - "google"
  - "travel"
  - "art"
  - "music"
  - "photography"
  - "love"
  - "fashion"
  - "food"

General Architecture

I think if we wanted to scale this service to handle more data, I would separate out the data processing, sentiment analysis, and graphing components of this app and have them be microservices, but given time constraints and the size of the project, I decided to make a monolith.

The API keys / secrets necessary to authenticate with the Twitter API are passed in as environment variables, which is a fairly standard way to handle passing around secrets in services.

The application uses a library to interact with the Twitter API, which provides a Tokio stream. The stream is processed one tweet at a time, which invokes a sentiment analysis library which provides the sentiment scores for each tweet and appends to a vector which holds the time series data for the sentiment analysis code.

I have some logic for handling configuration and I use the typestate pattern to verify the configs. I handle the tokio/webserver stuff in the main function, which moves the tokio streaming logic to its own thread, and has a webserver which serves the graph. Unfortunately I was not able to serve the graph because I wasn't able to figure out a clean way to hold a reference to the data that was being ingested by the tweet streaming logic in the method that handles the GET request.

Tokio

The meat of this project is the way we use Tokio. I implement a future that processes a stream and converts each item into a sentiment analysis score and prints that to STDOUT, so you have a live view of scores as they come in. I implement a separate stream per keyword so we can evaluate each keyword concurrently as they come in.

I also set up a separate task to handle printing values to STDOUT. I was worried that having different futures attempt to write to the console right after getting a sentiment score might introduce some lock contention, so I set up an MPSC queue for a separate task to ingest so that we could print to STDOUT without having to worry about locking STDOUT.

Dependencies

At first I wanted to use the rust-twitter-streaming crate, but it doesn't build properly (likely because it was using nightly async functions and hasn't been updated in the last few months, so things ended up breaking and never got updated). I settled on using the egg-mode crate, which does build properly and allows you to link against rustls so we don't have to worry about linking errors with OpenSSL (which I have dealt with in the past).

I use the structopt crate to handle parsing command line arguments and automatically generate nice help messages, it's one of my favorite crates.

I'm using tokio as the execution context for my app.

afnanenayet / twitter-stream