databrickslabs / dbx

🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.

Home Page:https://dbx.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using the same workspace directory between different environments

WmWessels opened this issue · comments

Expected Behavior

I would like to create two environments (in .dbx/project.json). Here, I want to have the same workspace directory in both environments, but use different artifact locations.

Current Behavior

When I deploy my python project using dbx in our CICD pipeline, I get an exception. The exception I get is this:

Exception: Required location of experiment /Shared/dbx/ doesn't match
the project defined one.

Steps to Reproduce (for bugs)

Create a dbx project. In the project.json, there should be two different environments. The workspace directory should be the same, but the artifact location should be different.

Then, create two deployment files (one for training, one for scoring). In the first deployment file, we create a workflow using the first environment. In the second deployment file, we create a workflow using the second environment.

finally:

  • dbx deploy --deployment-file <deployment_file_train>
  • dbx deploy --deployment-file <deployment_file_score>

Context

We want to version our ML code in production. We currently have a training workflow and a scoring workflow (training workflow stores the trained models, scoring refers to these models). As such, we would like the training workflow and scoring workflow to use the same workspace directory. However, we also want to use different artifact locations, such that we can version our code and not have the training/scoring workflows use the same code version.

How would I need to structure my project.json in order to get this to work?

Your Environment

  • dbx version used: 0.8.17
  • Databricks Runtime version: 12.2