fernandoprass / etl

ETL - Extract, Transform, Load

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Extract, Transform, Load (ETL)!

Extract, Transform, Load (ETL) “is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system”. (IBM).

Extract

During data extraction, raw data is copied or exported from source locations to a staging area. Source locations can come from a variety of different sources, structured or unstructured

Transform

In the staging area the data is transformed for its intended use. Transformed can mean:

•Filtering, cleaning, aggregating, de-duplicating, and validating;

•Performing calculations, translations, summarizations or conversions (units of measurement, currency, ...);

•Conducting audits to ensure data quality and compliance, and computing metrics.

•Removing, encrypting or protecting private data (GDPR);

•Formatting the data into tables or joined tables to match the target schema, change/update titles and description.

Load

In this last step, the transformed data is moved from the staging area into a target repository (data warehouse, data lake, DB), where the data is ready for querying, reporting, analytics, or any other downstream processes.

About

ETL - Extract, Transform, Load