redvg / dataflow-cpb101-pipeline-mapreduce-py

GCP Dataflow pipeline with mapreduce in python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dataflow-cpb101-pipeline-mapreduce-py

GCP Dataflow pipeline with mapreduce in python

as per https://codelabs.developers.google.com/codelabs/cpb101-mapreduce-dataflow-py
and per https://www.udemy.com/gcp-data-engineer-and-cloud-architect/learn/v4/t/lecture/7598622?start=0
and per https://github.com/GoogleCloudPlatform/training-data-analyst/tree/master/courses/data_analysis/lab2/python
\

prereqs

pip install google-cloud-dataflow oauth2client==3.0.0

pkg_popularity_pipeline_local.py

for local deploy
feed correct path with --input \

pkg_popularity_pipeline_cloud.py

for dataflow deploy
reads and writes to Cloud Storage bucket
note BUCKET_ID & PROJECT_ID vars \

About

GCP Dataflow pipeline with mapreduce in python


Languages

Language:Python 100.0%