Azure Databricks Hands-on (Tutorials)
Follow each instructions on notebook below.
- Storage Settings
- Basics of PySpark, Spark Dataframe, and Spark Machine Learning
- Spark Machine Learning Pipeline
- Hyper-parameter Tuning
- MLeap (requires ML runtime)
- Horovod Runner with TensorFlow (requires ML runtime)
- Structured Streaming (Basic)
- Structured Streaming with Azure EventHub or Kafka
- Delta Lake
- MLflow (requires ML runtime)
- Orchestration with Azure Data Services
- Delta Live Tables
How to start
- Create Azure Databricks resource in Microsoft Azure.
After the resource is created, launch Databricks workspace UI by clicking "Launch Workspace". - Create a compute (cluster) in Databricks UI. (Select "Compute" menu and proceed.)
Databricks Runtime Version 10.2 ML or above is recommended for this tutorial. - Download HandsOn.dbc and import into your workspace as follows.
- Select "Workspace" in Workspace UI.
- Go to user folder, click your e-mail (the arrow icon), and then select "import" command.
- Pick up
HandsOn.dbc
to import.
- Open notebook and attach above compute (your cluster) in every notebook. (Select compute on the top of each notebook.)
- Please run "Exercise 01 : Storage Settings (Prepare)" notebook first, before running other notebooks.
Note : You cannot use Azure trial (free) subscription, because of the limited quota. Please promote to pay-as-you-go when you're in Azure free subscription. (The credit will be reserved, even when you transit to pay-as-you-go.)
Tsuyoshi Matsuzaki @ Microsoft