Overview

ARO demo using Open Data Hub and Azure data services - Azure Data Lake and Azure Blob Storage.

Prerequisites

Azure Red Hat OpenShift 4 Cluster
Admin access to OpenShift
OpenShift CLI
Install Open Data Hub

Setup

Download the Iris dataset.

Create a storage account with Azure Data Lake.

Create a storage principal:

Make sure to assign the Storage Blob Data Contributor role to the service principal
Create a new application secret for authenticating the service principal
Copy down the client-id, tenant-id, and client-secret values (you will need this later)

View your account access key and copy down the storage account's connection string (you will need this later)

Download azcopy.

Upload the Iris dataset to Azure Data Lake

Replace with your tenant-id and storage account name

azcopy login --tenant-id=<tenant-id> 
azcopy make 'https://<storage-account-name>.dfs.core.windows.net/mycontainer'
azcopy copy iris.data 'https://<storage-account-name>.dfs.core.windows.net/mycontainer/sample/iris.data'

Configure anonymous access to storage container mycontainer.

Launch JupyterHub

echo $(oc get route jupyterhub -n odh --template='http://{{.spec.host}}')

Select the s2i-spark-minimal-notebook image and spawn the server. Leave the other settings as they are.

Upload the model_pipeline.ipynb notebook. Set the variables in the second cell where it says ### ENTER YOUR DETAILS ###.

TODO

Mount a secret with the env variables for the client, tenant, and client secret values
Add Kubeflow on Tekton pipeline
Add model validation and model update to the pipeline
Add Spark connection to Azure Data Lake

About

Azure Red Hat OpenShift (ARO) demo using Open Data Hub (ODH) and Azure data services

Apache License 2.0

Languages

Language:Jupyter Notebook 84.9%Language:Python 15.1%