shenna2017 / azure-databricks-exercise

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Azure Databricks Hands-on Exercise

Please download HandsOn.dbc and import into your workspace on Azure Databricks.

Both Exercise 05 and Exercise 06 require Runtime Version 5.1 ML or above.
Exercise 09 (Databricks Delta) requires Premium tier. (Otherwise you can use Standard tier.)
Exercise 10 (MLFlow) requires Runtime Version 5.4 or above.
Others require Databricks Runtime Version 5.1 (includes Spark 2.4.0, Scala 2.11) or above.

Follow each instructions on notebook below.

  1. Storage Settings
  2. Basics of Pyspark and Spark Machine Learning
  3. Spark Machine Learning Pipeline
  4. Hyper-parameter Tuning
  5. MLeap (needs ML runtime)
  6. Horovod Estimator on Databricks Runtime for ML (needs ML runtime)
  7. Structured Streaming
  8. Structured Streaming with Azure EventHub or Kafka
  9. Delta Lake
  10. Work with MLFlow
  11. Orchestration with Azure Data Services

This sample worked on Databricks Runtime Version 5.4 ML.

Note : When you're using Azure Trial (Free) subscription, you cannot run your cluster (because of limited vCPU quota). Please promote to Pay-As-You-Go. (The credit is reserved even when using Pay-As-You-Go.)

Tsuyoshi Matsuzaki @ Microsoft

About