There are 8 repositories under apache-beam topic.
TFX is an end-to-end platform for deploying production ML pipelines
Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
Yet Another UserAgent Analyzer
[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
TFRecorder makes it easy to create TensorFlow records (TFRecords) from Pandas DataFrames and CSVs files containing images or structured data.
A collection of tools for extracting FHIR resources and analytics services on top of that data.
Clojure API for a more dynamic Google Dataflow
Collection of transforms for the Apache beam python SDK.
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Repository to quickly get you started with new Machine Learning projects on Google Cloud Platform. More info(slides):
Export a whole BigQuery table to Google Datastore with Apache Beam/Google Dataflow
Some class materials for a data processing course using PySpark
Microservices in Post-Kubernetes Era. A polyglot monorepo
Blockchain ETL Architecture
Apache Beam examples for running on Google Cloud Dataflow.
Efficient streaming data ingestion, transformation & activation
Log analysis pipeline utilizing Apache Beam
Convenient Dataflow pipelines for transforming data between cloud data sources
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
Libraries for efficient and scalable group-structured dataset pipelines.
Apache Beam I/O connector designed for accessing MySQL databases. https://beam.apache.org/documentation/io/connectors/#other-io-connectors-for-apache-beam
Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).
Projects and studies regarding Data Engineering Area
Source code for the YouTube video, Apache Beam Explained in 12 Minutes
Code to statistically up-weight conversion values of consenting customers to feed up to 100% of the factual conversion values back into Google Ads.
The Proxima platform.
This project leverages GCS, Composer, Dataflow, BigQuery, and Looker on Google Cloud Platform (GCP) to build a robust data engineering solution for processing, storing, and reporting daily transaction data in the online food delivery industry.