Azure Databricks Hands-on (Tutorials)

Follow each instructions on notebook below.

Storage Settings
Basics of Pyspark and Spark Machine Learning
Spark Machine Learning Pipeline
Hyper-parameter Tuning
MLeap (requires ML runtime)
Horovod Runner on Databricks Runtime for ML (requires ML runtime)
Structured Streaming (Basic)
Structured Streaming with Azure EventHub or Kafka
Delta Lake
Work with MLFlow (requires ML runtime)
Orchestration with Azure Data Services

Before you start

Create Azure Databricks resource in Microsoft Azure, and launch workspace. See details from instructor or from the Quickstart.
Create a computing cluster on Databricks workspace. (Select "Compute" in Workspace UI.)
Databricks Runtime Version 10.2 ML or above is recommended for running this tutorial.
Download HandsOn.dbc and import into your workspace.
- Select "Workspace" in Workspace UI.
- Go to user folder.
- Click your e-mail (the arrow in the right side) and select "import" command to import HandsOn.dbc.
Open the imported notebook and attach your cluster in the notebook. (Select cluster on top of notebook.)

Note : You cannot use Azure Trial (Free) subscription, because of limited vCPU quota. Please promote to Pay-As-You-Go when you use trial subscription. (The credit will be reserved even when you transit to Pay-As-You-Go.)

Additional resources for further exploration

Azure has extensive documentation online.
Databricks has even more training materials listed in this Guide. The Azure material is specifically relevant. While they say this is free customer training, it only takes a free registration to become a "customer".
Books (most of which are available on Oreilly online):
- Azure Databricks Cookbook This one is better than many Manning Cookbooks and is pretty helpful.
- Advanced Analytics with PySpark This is an update of one of my favorite books on Spark, made accessible to Python. This previous version focused on Scala.
- Azure Databricks This playlist has extensive references.
- Delta Lake: The Definitive Guide I find this one to be a bit much, but it sure is definative.
Videos:
- Debugging Apache Spark This one is a bit older for Spark, but the author, Holden Karau, is a great longtime advocate of PySpark and open source big data contributor. She is a great speaker, and does coding livestreaming on Youtube and Twitch. Look out for her dog Timbit, who makes occasional appearances.
- Building Your First ETL Pipeline Using Azure Databricks For a review of some of the material covered last week.
- Predictive Analytics Using Apache Spark MLlib on Databricks Janani Ravi is one of my favorite instructors on Pluaralsight discussing Spark and ML. This course is part of this learning path: https://app.pluralsight.com/paths/skill/apache-spark-on-databricks

Modified by Ed Fine @Afinepoint
Links to code provided to keep up to date. Original code by Tsuyoshi Matsuzaki @ Microsoft

edfine / azure-databricks-exercise

Azure Databricks Hands-on (Tutorials)

Before you start

Additional resources for further exploration

About