danielsef / azure-databricks-exercise

Azure Databricks Hands-on (Tutorials)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Azure Databricks Hands-on (Tutorials)

Follow each instructions on notebook below.

  1. Storage Settings
  2. Basics of PySpark, Spark Dataframe, and Spark Machine Learning
  3. Spark Machine Learning Pipeline
  4. Hyper-parameter Tuning
  5. MLeap (requires ML runtime)
  6. Horovod Runner with TensorFlow (requires ML runtime)
  7. Structured Streaming (Basic)
  8. Structured Streaming with Azure EventHub or Kafka
  9. Delta Lake
  10. MLflow (requires ML runtime)
  11. Orchestration with Azure Data Services
  12. Delta Live Tables

How to start

  • Create Azure Databricks resource in Microsoft Azure.
    After the resource is created, launch Databricks workspace UI by clicking "Launch Workspace".
  • Create a compute (cluster) in Databricks UI. (Select "Compute" menu and proceed.)
    Databricks Runtime Version 10.2 ML or above is recommended for this tutorial.
  • Download HandsOn.dbc and import into your workspace as follows.
    • Select "Workspace" in Workspace UI.
    • Go to user folder, click your e-mail (the arrow icon), and then select "import" command.
    • Pick up HandsOn.dbc to import.
  • Open notebook and attach above compute (your cluster) in every notebook. (Select compute on the top of each notebook.)
  • Please run "Exercise 01 : Storage Settings (Prepare)" notebook first, before running other notebooks.

Note : You cannot use Azure trial (free) subscription, because of the limited quota. Please promote to pay-as-you-go when you're in Azure free subscription. (The credit will be reserved, even when you transit to pay-as-you-go.)

Tsuyoshi Matsuzaki @ Microsoft

About

Azure Databricks Hands-on (Tutorials)