damoodamoo / hack-databricksterraform

A hack to create a cluster in Databricks using terraform and the shell provider

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What does it do?

Creates a:

  1. Databricks workspace in Azure
  2. PAT token for accessing Databricks API
  3. Cluster in Databricks workspace
  4. Store the cluster_id and pat_token between runs in Terraform state

Run

  1. Have azurecli installed and logged in, have python3 and pip3 installed.
  2. Run invoke-psake ./psake.ps1 installRequirements
  3. Run terraform apply -auto-approve -var 'group_name=yourGroupNameHere' (run terraform plan first if you want to see what will be created)
  4. See cluster created

Dev

The repo includes VSCode devcontainer. Clone and open the code in VSCode and enable the devcontainer.

Info: https://code.visualstudio.com/docs/remote/containers

Run Invoke-psake ./make.ps1 test for basic unit tests or Invoke-psake ./make.ps1 integrationTest for a full integration test run.

Debug

Cluster config examples

Under ./examples are some example cluster configurations for setting up clusters in certain ways.

Cleanup

databricks clusters list | awk '{print $1}' | xargs -n 1 databricks clusters delete --cluster-id

Logs

Option 1

Set debug_log to true in the terraform resource you want to debug and you'll get cluster.[create|read|update|delete].logs files appear after the provider executes.

  environment = {
    debug_log        = true
  }

Option 2

Set TF_LOG=debug then re-run the terraform apply. Logs like the following will be visible to show you what has happened when the scripts executed.

For example here is some output when the script path is incorrect:

-------------------------
[DEBUG] Command execution completed:
-------------------------
[DEBUG] no JSON strings found in stdout
[DEBUG] Unlocking "shellScriptMutexKey"
[DEBUG] Unlocked "shellScriptMutexKey"
[DEBUG] Reading shell script resource...
-------------------------
[DEBUG] Current stack:
[DEBUG] -- create
[DEBUG] -- read
-------------------------
[DEBUG] Locking "shellScriptMutexKey"
[DEBUG] Locked "shellScriptMutexKey"
[DEBUG] shell script command old state: "&{[DATABRICKS_HOST=https://eastus.azuredatabricks.net machine_sku=Standard_D3_v2 worker_nodes=8 DATABRICKS_TOKEN=ATOKENMIGHTBEHERE] map[]}"
[DEBUG] shell script going to execute: /bin/sh -c
   pwsh -command '& { . ./cluster.ps1; read }'
-------------------------
[DEBUG] Starting execution...
-------------------------
  .: The term './cluster.ps1' is not recognized as the name of a cmdlet, function, script file, or operable program.
  Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
  read: The term 'read' is not recognized as the name of a cmdlet, function, script file, or operable program.
  Check the spelling of the name, or if a path was included, verify that the path is correct and try again.

About

A hack to create a cluster in Databricks using terraform and the shell provider

License:MIT License


Languages

Language:PowerShell 75.8%Language:Dockerfile 16.0%Language:HCL 8.2%