Data professionals from all walks of life will benefit from this comprehensive introduction to the components of the Databricks Lakehouse Platform that directly support putting ETL pipelines into production. Students will leverage SQL and Python to define and schedule pipelines that incrementally process new data from a variety of data sources to power analytic applications and dashboards in the Lakehouse. This course offers hands-on instruction in Databricks Data Science & Engineering Workspace, Databricks SQL, Delta Live Tables, Databricks Repos, Databricks Task Orchestration, and the Unity Catalog.
- Leverage the Databricks Lakehouse Platform to perform core responsibilities for data pipeline development
- Use SQL and Python to write production data pipelines to extract, transform, and load data into tables and views in the Lakehouse
- Simplify data ingestion and incremental change propagation using Databricks-native features and syntax, including Delta Live Tables
- Orchestrate production pipelines to deliver fresh results for ad-hoc analytics and dashboarding
- Identify core features of Spark and Databricks.
- Describe how DataFrames are created and evaluated in Spark.
- Apply the DataFrame transformation API to process and analyze data.
- Demonstrate how Spark is optimized and executed on a cluster.
- Apply Delta and Structured Streaming to process streaming data.
- Udemy course on Data Engineering with Databricks.
- End to end execution.
Practical guide: https://github.com/databricks-academy/scalable-machine-learning-with-apache-spark
- Create data processing pipelines with Spark.
- Build and tune machine learning models with Spark ML.
- Track, version, and deploy models with MLflow.
- Perform distributed hyperparameter tuning with Hyperopt.
- Use Spark to scale the inference of single-node models.
- Link to R2D3 visualization: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
Practical guide: https://github.com/databricks-academy/ml-in-production-english
Practical guide: https://databricks-academy.github.io/data-analysis-with-databricks-sql/v1.1.5/
Updated as of: 27/8/2022
- Machine Learning Scientist Associate with Databricks
- Machine Learning Scientist Professional with Databricks