bretamyers / Azure-Synapse-Lakehouse-Sync

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Azure Synapse Lakehouse Sync

Description

Azure Synapse Lakehouse Sync provides an easy solution to synchronizing modeled Gold Zone data from your data lake, to your Synapse Analytics Data Warehouse. Through a series of Databricks notebooks and Synapse Analytics pipelines, it offers a working example of how to continually synchronize your tables.

Additionally, it leverages the new Change Data Feed capabilities in the Delta 2.x format to better track changes to your Gold Zone tables. This allows for significantly easier and more performant extracts of changed data. Best practices are then used to stage, ingest, and store data in the most performant and optimized way within Azure Synapse Dedicated SQL. The synchronization schedule can be configured for whatever interval works best for your environment, whether it's every 10 minutes or daily.

Azure Synapse Lakehouse Sync is designed to be a fully automated, self-healing, and hands-off approach to continually synchronize your data lake with your data warehouse.

Azure.Synapse.Lakehouse.Sync.Overview.mp4

Using Azure Synapse Lakehouse Sync

Self Deployment: Instructions for deploying, configuring, and using Azure Synapse Lakehouse Sync in your own environment.

Tutorial Environment: Deploys a fully working Azure Synapse Lakehouse Sync tutorial environment in your Azure Subscription. This is a great way to experience how Azure Synapse Lakehouse Sync works end-to-end.

About


Languages

Language:Jupyter Notebook 61.1%Language:Shell 19.0%Language:Bicep 16.7%Language:TSQL 3.2%