ryanmcdowell / dataflow-bigquery-dynamic-destinations

An example pipeline for dynamically routing events from Pub/Sub to different BigQuery tables based on a message attribute.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dataflow BigQuery Dynamic Destinations

Occasionally you may have data of different domains being ingested into a single topic which requires dynamically routing to the proper output table in BigQuery. One way to accomplish this is to use a message attribute on the header of the Pub/Sub message to indicate the table which the message should be routed to. The routing of the messages to the correct table can be accomplished in a single pipeline by using the BigQueryIO transform's dynamic destination capabilities. In this pipeline, a SerializableFunction is used to extract the table attribute from the Pub/Sub message and then subsequently route to the proper table destination.

Pipeline

PubsubToBigQueryDynamicDestinations - A pipeline which consumes JSON messages from Pub/Sub and outputs records to BigQuery tables using a Pub/Sub message attribute to determine the proper table to route the message to.

Getting Started

Requirements

  • Java 8
  • Maven 3

Building the Project

Build the entire project using the maven compile command.

mvn clean && mvn compile

About

An example pipeline for dynamically routing events from Pub/Sub to different BigQuery tables based on a message attribute.

License:Apache License 2.0


Languages

Language:Java 100.0%