manujjoshi / Databricks

Hands-on sheets for Databricks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Databricks Repository:

1. Data Engineering with Databricks

Data professionals from all walks of life will benefit from this comprehensive introduction to the components of the Databricks Lakehouse Platform that directly support putting ETL pipelines into production. Students will leverage SQL and Python to define and schedule pipelines that incrementally process new data from a variety of data sources to power analytic applications and dashboards in the Lakehouse. This course offers hands-on instruction in Databricks Data Science & Engineering Workspace, Databricks SQL, Delta Live Tables, Databricks Repos, Databricks Task Orchestration, and the Unity Catalog.

Objective:
  • Leverage the Databricks Lakehouse Platform to perform core responsibilities for data pipeline development
  • Use SQL and Python to write production data pipelines to extract, transform, and load data into tables and views in the Lakehouse
  • Simplify data ingestion and incremental change propagation using Databricks-native features and syntax, including Delta Live Tables
  • Orchestrate production pipelines to deliver fresh results for ad-hoc analytics and dashboarding

2. Spark Programming with Databricks

Objective:
  • Identify core features of Spark and Databricks.
  • Describe how DataFrames are created and evaluated in Spark.
  • Apply the DataFrame transformation API to process and analyze data.
  • Demonstrate how Spark is optimized and executed on a cluster.
  • Apply Delta and Structured Streaming to process streaming data.

3. F1 Project with Databricks

  • Udemy course on Data Engineering with Databricks.
  • End to end execution.

4. Machine Learning with Databricks Associate

Practical guide: https://github.com/databricks-academy/scalable-machine-learning-with-apache-spark

Objective:
  • Create data processing pipelines with Spark.
  • Build and tune machine learning models with Spark ML.
  • Track, version, and deploy models with MLflow.
  • Perform distributed hyperparameter tuning with Hyperopt.
  • Use Spark to scale the inference of single-node models.
  • Link to R2D3 visualization: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

5. Machine Learning with Databricks Professional

Practical guide: https://github.com/databricks-academy/ml-in-production-english

6. Data Analysis with Databricks SQL

7. Data Analysis with Databricks SQL Workshop by David Harris

Practical guide: https://databricks-academy.github.io/data-analysis-with-databricks-sql/v1.1.5/

Happy Learning!!

Updated as of: 27/8/2022

Repo: In creation phase

Cleared:

  • Machine Learning Scientist Associate with Databricks
  • Machine Learning Scientist Professional with Databricks

About

Hands-on sheets for Databricks


Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%