There are 8 repositories under apache-beam topic.
TFX is an end-to-end platform for deploying production ML pipelines
Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
Yet Another UserAgent Analyzer
[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
A collection of tools for extracting FHIR resources and analytics services on top of that data.
TFRecorder makes it easy to create TensorFlow records (TFRecords) from Pandas DataFrames and CSVs files containing images or structured data.
Clojure API for a more dynamic Google Dataflow
Collection of transforms for the Apache beam python SDK.
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Asgarde allows simplifying error handling with Apache Beam Java, with less code, more concise and expressive code.
Repository to quickly get you started with new Machine Learning projects on Google Cloud Platform. More info(slides):
Export a whole BigQuery table to Google Datastore with Apache Beam/Google Dataflow
Microservices in Post-Kubernetes Era. A polyglot monorepo
Some class materials for a data processing course using PySpark
Blockchain ETL Architecture
Asgarde allows simplifying error handling with Apache Beam Python, with less code, more concise and expressive code.
Apache Beam examples for running on Google Cloud Dataflow.
Efficient streaming data ingestion, transformation & activation
Log analysis pipeline utilizing Apache Beam
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
Convenient Dataflow pipelines for transforming data between cloud data sources
Libraries for efficient and scalable group-structured dataset pipelines.
Code to statistically up-weight conversion values of consenting customers to feed up to 100% of the factual conversion values back into Google Ads.
This project leverages GCS, Composer, Dataflow, BigQuery, and Looker on Google Cloud Platform (GCP) to build a robust data engineering solution for processing, storing, and reporting daily transaction data in the online food delivery industry.
Source code for the YouTube video, Apache Beam Explained in 12 Minutes
An Apache Beam I/O connector for seamless integration with MySQL database 🔗 https://beam.apache.org/documentation/io/connectors/#other-io-connectors-for-apache-beam
The Proxima platform.
Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).
Projects and studies regarding Data Engineering Area