edfine / azure-databricks-exercise

Azure Databricks Hands-on (Tutorials)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Azure Databricks Hands-on (Tutorials)

Follow each instructions on notebook below.

  1. Storage Settings
  2. Basics of Pyspark and Spark Machine Learning
  3. Spark Machine Learning Pipeline
  4. Hyper-parameter Tuning
  5. MLeap (requires ML runtime)
  6. Horovod Runner on Databricks Runtime for ML (requires ML runtime)
  7. Structured Streaming (Basic)
  8. Structured Streaming with Azure EventHub or Kafka
  9. Delta Lake
  10. Work with MLFlow (requires ML runtime)
  11. Orchestration with Azure Data Services

Before you start

  • Create Azure Databricks resource in Microsoft Azure, and launch workspace. See details from instructor or from the Quickstart.

  • Create a computing cluster on Databricks workspace. (Select "Compute" in Workspace UI.)
    Databricks Runtime Version 10.2 ML or above is recommended for running this tutorial.

  • Download HandsOn.dbc and import into your workspace.

    • Select "Workspace" in Workspace UI.
    • Go to user folder.
    • Click your e-mail (the arrow in the right side) and select "import" command to import HandsOn.dbc.
  • Open the imported notebook and attach your cluster in the notebook. (Select cluster on top of notebook.)

Note : You cannot use Azure Trial (Free) subscription, because of limited vCPU quota. Please promote to Pay-As-You-Go when you use trial subscription. (The credit will be reserved even when you transit to Pay-As-You-Go.)

Additional resources for further exploration

Modified by Ed Fine @Afinepoint
Links to code provided to keep up to date. Original code by Tsuyoshi Matsuzaki @ Microsoft

About

Azure Databricks Hands-on (Tutorials)