danielvdende / apache_streaming_experiments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Apache Streaming Experiments

This repo contains a Vagrantfile and some provisioning scripts for a number of Apache streaming analytics frameworks. The goal is to be able to try out these frameworks with as low a barrier as possible, without interfering with the joy (and pain) of configuring/setting up a working environment. Each box has the minimal basics for trying out the framework (mostly: updating packages, installing java, fetching the tarball). Each box also has a synced folder. This folder syncs between the virtual environment and the host box you're running it on. This can be configured as necessary.

An Apache Kafka box is provided to serve as message source for the streaming frameworks. Moreover, a producer script is also provided, that can simulate an endless stream by creating an infinite loop of data, based on any input file (assuming the data is structured using newlines between each entry). Some example data files have been provided as well. NOTE: The producer will not do ANY parsing of the data, it will simply push the message as-is onto the Kafka topic (i.e. as a String). If you want anything else (i.e. json, csv, protobuff, etc.) you are of course welcome to add this. You can grab a time series dataset from: https://datamarket.com/data/list/?q=provider%3Atsdl. The speed of the message production can also be controlled via config variables in the script.

About


Languages

Language:Shell 81.3%Language:Python 18.7%