In this two-day workshop we will explore what Apache Beam en Dataflow is and how you can use it. The First day we are mainly focussing on understanding the mechanics. We will create our one Batch Pipeline. The second day we will dive deeper in streaming pipelines and the different mechanics Beam has to deal with the challenges.
[root]
|
└ Lab-1-Lab-1-batch-pipeline | We create our own Batch Pipeline.
└ Lab-2-windowing | We create our Batch Pipeline.
|--------------------------------|-----------------------------------------------------------------------------------
Lab/Folder | Description |
---|---|
df-lab-01 | Batch pipeline that calculates the how often a song is played |
df-lab-02 | UNDER CONTRUCTION |
.. | .. |
|
To run the pipelines with Dataflow you need to have a GCP project with billing enabled.
- Python 3
- pip
vscode
installation installation guidepyCharm
installation installation guidegCloud
installation installation guide
The preparation of your local environment is one of the first steps to handle all of our labs and is the basis for all our further activity using the local development environment of all participants.
-
Clone Repository
Please make sure that work with the latest main-branch version of our labs-repository. If there are changes to the kernel repository during the workshop, you can save the current local change state with
git stash
and get the new state withgit pull
.$ TEMPDIR=/tmp $ cd $TEMPDIR $ git clone https://github.com/doitintl/Dataflow-Fundamentals.git $ cd Dataflow-Fundamentals
-
GCP Credential Configuration (optional)
Set default GCP credentials and set project
$ gcloud config init $ gcloud config set project <YOUR PROJECT>
-
Installation requirements
Make sure you are using Python 3 and have installed pip.
$ pip install -r requirements.txt
-
INIT/RUN Lab
Go to the corresponding labs sub-directory and follow the corresponding instructions in the documentation stored there!s
- https://beam.apache.org/documentation/programming-guide/
- https://tour.beam.apache.org/
- https://beam.apache.org/get-started/resources/learning-resources/#getting-started
See LICENSE for full details.
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Copyright © 2024 DoiT International