viniciusdsmello / airflow-k8s-gitsync

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Airflow with KubernetesExecutor + GitSync sidecar

Overview

This repository contains the necessary steps and configurations to deploy Airflow with KubernetesExecutor and GitSync sidecar in a Kubernetes cluster. It allows you to run Airflow jobs within a Kubernetes environment.

Project Structure

|-- dags/
|-- helm/
    |-- airflow/
        |-- files/
            |-- webserver_config.py: Airflow Webserver's config file
        |-- templates/
    |-- pv.yaml: PersistentVolume definition
    |-- values.yaml: Contains all values used to deploy to production 
    |-- values_local.yaml: Contains all values used to deploy to local environment, i.e., postgres/pgbouncer
|-- scripts/
    |-- install_k8s_secrets.sh
    |-- install_local_dependencies.sh
    |-- run_airflow_local.sh
|-- src/
    |-- Dockerfile
    |-- requirements.txt
|-- Makefile

The project structure consists of the following directories:

  • dags/: Contains the Airflow DAGs (Directed Acyclic Graphs) that define the workflows and tasks.
  • helm/: Includes the Helm chart configuration files, such as values.yaml, which holds the configuration values for Airflow deployment.
  • scripts/: Contains various scripts for installing Kubernetes secrets, local dependencies, and running Airflow locally.
  • src/: Holds the Dockerfile and requirements.txt file necessary for building the Airflow Docker image.
  • Makefile: Provides convenience commands for building and running the project.

Setup

Prerequisites

Before proceeding, ensure that you have the following prerequisites installed:

  1. Docker: Required for building the Airflow Docker image.
  2. Minikube: Needed for setting up a local Kubernetes cluster.
  3. Make: Required for executing make commands
  4. Helm: Required for deploying Airflow using the Helm package manager.
  5. Kubectl: Required for connecting to Kubernetes API

Create PersinsteVolume and Secrets

You will also need to setup a PersistenVolume with ReadWriteMany mode. This is important so all Airflow Pods can save logs in a shared volume.

Besides, it is important to create the following secrets:

  1. airflow-fernet-key: This secret should contain the Fernet key used for encryption and decryption in Airflow.
  2. airflow-gitsync: This secret should contain the necessary credentials to access your Git repository for syncing DAGs.

In order to set up the PV and Secrets, run the following steps:

  1. Edit the file fernet-key with you desired key

Build Docker Image

To build Airflow image, use the following steps:

  1. Build the Airflow Docker image and push it to the local Docker registry:

    make build
    make push

Local Deployment

To deploy Airflow locally, use the following steps:

  1. Start Minikube, create the PersistentVolume and Secrets, then deploy airflow:

    minikube start
    make setup-k8s-prerequisites
    make deploy-local
  2. Monitor the deployment and wait for all Airflow pods to be in a running state:

    kubectl get pods --watch
  3. Once all the pods are running, you can access the Airflow web UI using the following command:

    kubectl port-forward svc/airflow-web 8080:8080
  4. Enable and run Airflow DAGs through the web UI, and monitor the job progress.

Remember to clean up the resources after you finish by running:

make cleanup

These steps will set up and deploy Airflow locally using Minikube and Helm, allowing you to test and run Airflow jobs in a local Kubernetes environment.

Production Deploymnet

In order to deploy Airflow to a Production environment we need to setup an external Postgres database and guarantee that we can create a PersistentVolume with NFS.

Then, you will need to create the Kubernetes Secret with Postgres password:

kubectl create secret generic airflow-postgres \
    --from-file=postgresql-user=$PWD/postgres-user \
    --from-file=postgresql-password=$PWD/postgres-password

Note that the script above reads the postgres password from a local files (postgres-user and postgres-password) (do not commit this file).

After creating the secrets go to file helm/values.yaml and disable the key postgresql.enabled, like in the example below:

...
postgresql:
  ## if the `stable/postgresql` chart is used
  ## [FAQ] https://github.com/airflow-helm/charts/blob/main/charts/airflow/docs/faq/database/embedded-database.md
  ## [WARNING] the embedded Postgres is NOT SUITABLE for production deployments of Airflow
  ## [WARNING] consider using an external database with `externalDatabase.*`
  enabled: false
...

Then, add the name of the recent created secrets to externalDatabase:

## the name of a pre-created secret containing the external database user
## - if set, this overrides `externalDatabase.user`
##
userSecret: ""
...
## the name of a pre-created secret containing the external database password
## - if set, this overrides `externalDatabase.password`
##
passwordSecret: "airflow-postgres"
...

To setup an ingress you must go to ingress.web, enable the webserver's ingress and change the host. If tls is required you must provide the values to ingress.web.tls. Note that, we do not need to enable flower since it is only used by CeleryExecutor.

Finally, you must configure the gitSync authentication method (ssh or user/password). If you choose ssh and wants to your own id_rsa there nothing to change, but if you want to use user/password method you must change the script scripts/install_k8s_secrets.sh and replace the secret airflow-gitsync creation command with the following:

kubectl create secret generic airflow-gitsync --from-file=user=$PWD/github-user --from-file=password=$PWD/github-password

Note that you need to create the following files github-user and github-password before applying the changes.

Finally, review all values inside the file helm/values.yaml and execute the following commands to deploy the application to your cluster (make sure you have kubectl context set to production):

make deploy-prod

References

  1. Airflow - Customizing the image
  2. Install Makefile
  3. Installing Helm

About


Languages

Language:Smarty 52.2%Language:Dockerfile 34.6%Language:Mustache 9.8%Language:Makefile 2.0%Language:Shell 0.7%Language:Python 0.6%