kiranskmr / workflows_automation

Automation of Databricks workflows

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Databricks Workflows Atomation and CI/CD integration

How to automate Databricks workflow deployments and integrate with your CI/CD pipeline.

Workflows Atomation

There are different ways you can automate the workflows deployments.

  • Databricks Asset bundles (Yaml Based)
  • PyDabs (Python version of Databricks Asset Bundles)
  • Databricks Terraform Provider
  • Python SDK
  • Rest API

The repo contains examples of automating your Databricks workflows using

(Databricks Asset Bundles)

The job definition part under dab_resources folder has examples of deploying diferent Job types in databricks workflows.

    - DABS - Customer Order Details - DLT job (dab_resources/dlt_dab_wf_job.yml) 
    - DABS - Customer Order Details - DBT job (dab_resources/dbt_dab_wf_job.yml) 
    - DABS - Python whl File from Volume job (dab_resources/wheel_wf_job.yml) 
    - DABS - MLOPS Write feature table job (dab_resources/feature-engineering-workflow-resource.yml) 
    - DABS - MLOPS Model training job (dab_resources/model-workflow-resource.yml)
    - DABS - MLOPS Batch inference job (dab_resources/batch-inference-workflow-resource.yml) 

Local Development

(Terraform)

The terraform code under terraform/modules/ has examples of deploying the above jobs using Terraform scripts. The job definitions are grouped under modules (terraform/modules).

    - Terraform - Customer Order Details - DLT job (terraform/modules/dlt-pipeline)
    - Terraform - Customer Order Details - DBT job (terraform/modules/dbt-pipeline)
    - Terraform - Python whl File from Volume job (terraform/modules/wheel-job)
    - Terraform - MLOPS Model-training-job (terraform/modules/mlops-job)
    - Terraform - MLOPS Model-training-job (terraform/modules/mlops-job)

Local Development

PyDabs (Private Preview)

The Python based workflows code under src/jobs has examples of deploying the below jobs using PyDABS.

     - PyDABS - Detect Anomalies job
     - PyDABS - Notebook Job
     - PyDABS - Task Values job

The pydabs code under src/jobs are used to generate the corresponding YML file to be deployed as part of the asset bundle deployment.

CI/CD integration

The repo has examples of integrating with different CI/CD platforms

Azure Devops

There are three pipelines defined for Azure Devops integration

  • Deploying a Databricks Asset bundles based workflows to databricks - azure-pipelines-dab.yml
  • Deploying a Terraform based workflows to databricks - azure-pipelines-tf.yml
  • Deploying a pipeline to run pytests - azure-pipelines-testcase.yml

Github Actions

There are five pipelines defined for Github Actions integration

  • Deploying a Databricks Asset bundles based workflows to Databricks CI - .github/workflows/dabsDeploymentCI.yml
  • Deploying a Databricks Asset bundles based workflows to Databricks CD - .github/workflows/dabsDeploymentCD.yml
  • Deploying a Terraform based workflows to Databricks CI - .github/workflows/TerraformDeploymentCI.yml
  • Deploying a Terraform based workflows to Databricks CD - .github/workflows/TerraformDeploymentCD.yml
  • Running Unit test case using pytest - .github/workflows/testCase.yml

The pipeline is deployed using the below wokflows.

A developer works on a feature branch localy in his IDE. All testing and deployment against a databricks workspace is done locally. A pull request is raised against the 'main' branch which triggers

  • The test case pipleine
  • The CI pipeline for DABS/PyDABS
  • The CI pipeline for Terraform

When the pull request is merged to the 'main' branch it triggers

  • The CD pipeline for DABS/PyDABS which deploys to the test workspace
  • The CD pipeline for Terraform which deploys to the test workspace

When a pull request is created and merged to the 'release' branch it triggers

  • The CD pipeline for DABS/PyDABS which deploys to the prod workspace
  • The CD pipeline for Terraform which deploys to the prod workspace

Deployment workflow

Running test cases

The repo has examples of running pytests and integrating the same in a ci/cd workflow

Test cases under src/tests are run againts a predefined cluster in a databricks workspace using Databricks Connect

Running pytes test cases using Databricks Connect.

Local Environment set up

Azure Devops Deployment

Github Actions Deployment

About

Automation of Databricks workflows


Languages

Language:Python 54.6%Language:HCL 42.6%Language:Shell 1.7%Language:Jupyter Notebook 1.1%