databrickslabs / dbx

🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.

Home Page:https://dbx.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question] how to get github environment variable in deployment file during CD (dbx deploy ....)

shrinidhikulkarnitw opened this issue · comments

Expected Behavior

I am trying to access environment variables from github environments in deployment file during CD pipeline in github action. I have stored env variable under github environment variable. But after successful dbx deployment, value of variable should show up in the cluster environment variables.

Current Behavior

Currently its not assigning any value to the env variable after successful deployment to cluster environment variable

Steps to Reproduce (for bugs)

I am trying to assign a variable under spark_env_vars in deployment file as shown below

image

and using this cluster config in workflow config

image

and this is how i stored env variable in github

image

but after deployment, env variable value didn't showed up

image

is i am missing anything here? can you folks please help me here.

Your Environment

DEV

  • dbx version used: latest version
  • Databricks Runtime version: 11.3

hi @shrinidhikulkarnitw
Could you please verify that the variable is correctly set and visible outside of dbx inside the CI/CD pipeline?

e.g. by testing it via echo $SAMPLE_VARIABLE

Asking because it seems like there is no value passed into dbx deploy command, therefore it fails to assign it.

Here are some tests that I've done to verify the behaviour:

File: .dbx/project.json:

{
    "environments": {
        "default": {
            "profile": "...",
            "storage_type": "mlflow",
            "properties": {
                "workspace_directory": "/Shared/dbx/dbx_env_vars",
                "artifact_location": "dbfs:/Shared/dbx/projects/dbx_env_vars"
            }
        }
    },
    "inplace_jinja_support": true,
    "failsafe_cluster_reuse_with_assets": true,
    "context_based_upload_for_execute": false
}

Snippet from deployment.yml:

custom:
  basic-cluster-props: &basic-cluster-props
    spark_version: "11.3.x-cpu-ml-scala2.12"

  basic-static-cluster: &basic-static-cluster
    new_cluster:
      <<: *basic-cluster-props
      num_workers: 1
      node_type_id: "..."
      spark_env_vars:
        SAMPLE_VAR: {{ env["SAMPLE_VAR"] }}

Deploying without any variable provided:

> dbx deploy

Leads to:

Screenshot 2023-07-14 at 10 56 26

Deploying with variable provided:

> SAMPLE_VAR=some dbx deploy

Leads to:
Screenshot 2023-07-14 at 10 57 30

Now, interesting twist - adding quotes leads to:

Snippet from deployment.yml:

custom:
  basic-cluster-props: &basic-cluster-props
    spark_version: "11.3.x-cpu-ml-scala2.12"

  basic-static-cluster: &basic-static-cluster
    new_cluster:
      <<: *basic-cluster-props
      num_workers: 1
      node_type_id: "..."
      spark_env_vars:
        SAMPLE_VAR: "{{ env["SAMPLE_VAR"] }}"

And then (without any variable provided):

> dbx deploy

It reproduces the behaviour you've met.

Screenshot 2023-07-14 at 10 59 23

Summary: remove wrapping quotes around {{ env["VAR_NAME"] }} statement.

commented

I have stumbled up on this problem as well while migrating to tag based releases. This "problem" surfaces in GitLab as well, but the documentation still provides examples with quotation marks. Maybe we should update the documentation when it does not work with quotation marks included?