MLflow Recipes Examples

This repository contains example projects for the MLflow Recipes (previously known as MLflow Pipelines). To learn about specific recipe, follow the installation instructions below to install all necessary packages, then checkout the relevant example projects listed here.

Note: This example repo is intended for first-time MLflow Recipes users to learn its fundamental concepts and workflows. For users already familiar with MLflow Recipes, find a template repository to solve a specific ML problem. For example, for regression problem, use recipes-regression-template instead.

Note: MLflow Recipes is an experimental feature in MLflow. If you observe any issues, please report them here. For suggestions on improvements, please file a discussion topic here. Your contribution to MLflow Recipes is greatly appreciated by the community!

Example Projects

Installation instructions

To use MLflow Recipes in this example repository, simply install the packages listed in the requirements.txt file. Note that Python 3.8 or above is required.

pip install -r requirements.txt

You may need to install additional libraries for extra features:

Hyperopt is required for hyperparameter tuning.
PySpark is required for distributed training or to ingest Spark tables.
Delta is required to ingest Delta tables. These libraries are available natively in the Databricks Runtime for Machine Learning.

Log to the designated MLflow Experiment

To log recipe runs to a particular MLflow experiment:

Open profiles/databricks.yaml or profiles/local.yaml, depending on your environment.
Edit (and uncomment, if necessary) the experiment section, specifying the name of the desired experiment for logging.

Development Environment -- Databricks

Sync this repository with Databricks Repos and run the notebooks/databricks notebook on a Databricks Cluster running version 11.0 or greater of the Databricks Runtime or the Databricks Runtime for Machine Learning with workspace files support enabled.

Note: When making changes to recipes on Databricks, it is recommended that you edit files on your local machine and use dbx to sync them to Databricks Repos, as demonstrated here

Note: data profiles display in step cards are not visually compatible with dark theme. Please avoid using the dark theme if possible.

Accessing MLflow recipe Runs

You can find MLflow Experiments and MLflow Runs created by the recipe on the Databricks ML Experiments page.

Development Environment -- Local machine

Jupyter

Launch the Jupyter Notebook environment via the jupyter notebook command.
Open and run the notebooks/jupyter.ipynb notebook in the Jupyter environment.

Note: data profiles display in step cards are not visually compatible with dark theme. Please avoid using the dark theme if possible.

Command-Line Interface (CLI)

First, enter the corresponding example root directory and set the profile via environment variable. For example, for the regression example project,

cd regression

export MLFLOW_RECIPES_PROFILE=local

Then, try running the following MLflow Recipes CLI commands to get started. Note that the --step argument is optional. Recipe commands without a --step specified act on the entire recipe instead.

Available step names are: ingest, split, transform, train, evaluate and register.

Display the help message:

mlflow recipes --help

Run a recipe step or the entire recipe:

mlflow recipes run --step step_name

Inspect a step card or the recipe dependency graph:

mlflow recipes inspect --step step_name

Clean a step cache or all step caches:

mlflow recipes clean --step step_name

Accessing MLflow Recipe Runs

To view MLflow Experiments and MLflow Runs created by the recipe:

Enter the example root directory, for example: cd regression
Start the MLflow UI

mlflow ui \
   --backend-store-uri sqlite:///metadata/mlflow/mlruns.db \
   --default-artifact-root ./metadata/mlflow/mlartifacts \
   --host localhost

mlflow / recipes-examples