Extract, Transform, Load (ETL) “is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system”. (IBM).
During data extraction, raw data is copied or exported from source locations to a staging area. Source locations can come from a variety of different sources, structured or unstructured
In the staging area the data is transformed for its intended use. Transformed can mean:
•Filtering, cleaning, aggregating, de-duplicating, and validating;
•Performing calculations, translations, summarizations or conversions (units of measurement, currency, ...);
•Conducting audits to ensure data quality and compliance, and computing metrics.
•Removing, encrypting or protecting private data (GDPR);
•Formatting the data into tables or joined tables to match the target schema, change/update titles and description.
In this last step, the transformed data is moved from the staging area into a target repository (data warehouse, data lake, DB), where the data is ready for querying, reporting, analytics, or any other downstream processes.