diegodorgam / twitter-hot-urls-example

Example Giant Swarm service, tracking URLs mentioned on Twitter and creating a ranked list

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

twitter-hot-urls-example (thux)

This repository features an example service consisting of multiple components working hand in hand to collect URLs mentioned on Twitter and create a hotlist of popular URLs.

It can be run with docker-compose or Kubernetes.

Contents:

Component Overview

Checkout the docker-compose.yml files for a more technical description of what this example service provides. Compare with the Kubernetes manifests in the kubernetes foleder.

Component Overview

tracker

This component consumes the Twitter Stream API, looking for tweets containing the strings http or https to fetch all tweets with links. The tweets are then parsed for contained URLs.

The URLs found are stored in the inbox redis database.

inbox

This component is a simple Redis database that receives all found URLs from the tracker component. It makes use of the official Redis Docker image.

This component consciously does not provide a volume, which means that whenever this component is restarted, the database content is lost.

resolver

The script inside this component reads URLs from the inbox Redis database and creates requests to those URLs in order to resolve redirects, to reveal the actual target URL. The resulting URL is stored in the hotlist Redis database.

To prevent accessing the same URL several times, a cache is maintained in the hotlist Redis.

The resolver component can be thought of as a worker, processing jobs from a queue. Since resolving URLs is in many cases a time-consuming job, there can be multiple instances of this component working in parallel.

resolver-scaler

This component contains a little script that watches the size of the inbox Redis database to find out if it remains constant. In case it's growing, it logs this information and tells that there shoul be more resolver instances to prevent the inbox from growing too big.

As a future improvement, the resolver-scaler can be modified to actually initiate the scaling of the resolver component via the Giant Swarm API.

hotlist

This second Redis database component stores all resolved URLs together with scoring information. It also contains the cache for the resolver. Just like the inbox component, we use the official Redis Docker image here.

In contrast to the inbox component, the hotlist provides a volume to persist the database throughout restarts.

hotlist-cleaner

This component contains a little helper that periodically removes outdated information from the hotlist Redis database.

frontend

This is a Python/Flask web application that offers a JSON API to fetch the resulting URL hotlist.

rebrow

The rebrow component offers a web-based user interface ("rebrow" stands for "redis browser") to debug the content of both Redis databases. It makes use of a third party Docker image.

Credentials to Access Twitter API

To access the streaming API of Twitter an personalized account is needed and some app specific credentials created at Twitter Application Management.

For example:

Name: thux
Description: Tracks URLs mentioned on Twitter and creates a ranked list
Website: https://github.com/giantswarm/twitter-hot-urls-example
Callback URL: <leave this field blank>

Additionally an Access Token needs to be generated under "Keys and Access Tokens". In the end four secrets or tokens need to be edited in secrets/twitter-api-secret.env for the docker-compose setup and in secrets/twitter-api-secret.yaml to run the Kubernetes example. For Kubernetes these values need to be encoded with base64, please see Kubernetes documentation about secrets.

Starting with Docker Compose

docker-compose up -d
docker-compose ps
docker-compose logs

docker-compose stop tracker

docker network ls
docker network inspect thux_default

About

Example Giant Swarm service, tracking URLs mentioned on Twitter and creating a ranked list


Languages

Language:Python 79.3%Language:HTML 20.7%