teddy-murigi / AzureSynapseWorkflows

This project leverages Azure Synapse Analytics service to create pipelines to copy data from a local SQL server table and a local CSV file into Azure Data Lake Gen 2. Subsequently, serverless SQL pools are employed to cleanse and transform this data, rendering it into a state suitable for analytical/reporting use in serverless Data Warehouse views.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This project is oriented towards demonstrating the ingestion of data in diverse formats (both structured and unstructured) originating from various data source types, including relational databases, file systems, SFTP locations, among others. The mechanism employed for this purpose involves the utilization of the Azure Synapse Analytics service to transfer this data into an Azure Storage account, specifically Azure Data Lake Gen 2.

Additionally, the project serves as an illustration of how these ingested files, once residing within Azure Data Lake, can undergo processing by harnessing the myriad features offered by Azure Synapse Analytics. To provide a practical context, the code within this project encompasses the creation of pipelines responsible for copying data from a local SQL server table and a local CSV file into Azure Data Lake. Subsequently, serverless SQL tools are employed to cleanse and transform this data, rendering it into a state fit for use in serverless Data Lakehouse views.

The overarching objective of this endeavor is to emulate a scenario commonly encountered in real-world organizational data management, wherein data from disparate sources, previously scattered in silos, can be methodically organized. This facilitates enhanced analytical capabilities by aligning the data with a pre-established physical data model, meticulously designed in accordance with specific business requirements.

About

This project leverages Azure Synapse Analytics service to create pipelines to copy data from a local SQL server table and a local CSV file into Azure Data Lake Gen 2. Subsequently, serverless SQL pools are employed to cleanse and transform this data, rendering it into a state suitable for analytical/reporting use in serverless Data Warehouse views.


Languages

Language:TSQL 100.0%