slilichenko / streaming-dataflow-examples

Streaming Dataflow Samples

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Beam Pipelines - Streaming Analytics Techniques

This project is a demo for several Beam techniques to do streaming analytics.

Running the demo

  1. Create a GCP project
  2. Create a file in terraform directory named terraform.tfvars with the following content:
    project_id = "<GCP Project Id>"
    
    There are additional Terraform variables that can be overwritten; see variables.tf for details.
  3. Run the following commands:
    export PROJECT_ID=<project-id>
    export GCP_REGION=us-central1
    export BIGQUERY_REGION=us-central1
  4. Create BigQuery tables, Pub/Sub topics and subscriptions, and GCS buckets by running this script:
    source ./setup-env.sh
  5. Start event generation process:
    ./start-event-generation.sh
  6. Start the event processing pipeline:
    (cd pipeline; ./run-streaming-pipeline.sh)
  7. Optionally, start the pipeline which will ingest the findings sent as pubsub messages into BigQuery:
    ./start-findings-to-bigquery-pipeline.sh

Cleaning up

  1. Shutdown the pipelines via GCP console (TODO: add scripts)
  2. Run this command:
    cd terraform; terraform destroy

Alternatively, delete the project you created.

Disclaimer

The techniques and code contained here are not supported by Google and is provided as-is (under Apache license). This repo provides some options you can investigate, evaluate and employ if you choose to.

About

Streaming Dataflow Samples

License:Apache License 2.0


Languages

Language:Java 89.3%Language:HCL 5.9%Language:Shell 3.3%Language:Python 1.5%