erreduarte / data-migration-project

Highlighting expertise in data migration, data normalization and standardization, this project demonstrates successful data transfer from Snowflake to Databricks. It emphasizes optimized data flow and enhanced accessibility through standardization, showcasing a commitment to ethical data practices.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Migration Project: Snowflake to Databricks

Overview

In response to the client's need to reduce costs associated with Snowflake, this project aimed to migrate marketing data from Snowflake to Databricks. The project focused on data modeling, transformation, and transportation to ensure data accuracy, accessibility, and reliability.

Project Scope and Objectives

The primary objectives of this project were:

  • Data Transformation: Restructuring and transforming marketing data to align with the Databricks environment. This was achieved using dbt (Data Build Tool) to automate the creation of data models and transformations, ensuring accuracy, consistency, and optimal performance.
  • Migration Strategy: Developing and executing a strategy to migrate data seamlessly from Snowflake to Databricks, minimizing downtime and preventing data loss.
  • Optimization: Enhancing data processing capabilities and optimizing query performance in the Databricks environment.

Key Responsibilities

As part of this project, the responsibilities included:

  • Ensuring all properties of the data were preserved during migration.
  • Gathering marketing data from various sources, both structured and unstructured, to ensure comprehensive and reliable data availability across the company.
  • Conducting stakeholder meetings to align the data modeling and migration process with organizational requirements and expectations.

Privacy and Confidentiality Considerations

Throughout the project, data privacy was a top priority. Fictitious names and data obfuscation techniques were employed to protect sensitive information without compromising the project's integrity or functionality. Additionally, several properties of the original code have been changed due to confidentiality requirements. These changes do not affect the overall functionality and performance of the code.

Project Structure

The project is organized as follows:

  • Data Modeling: Scripts and processes for restructuring marketing data to fit Databricks schemas.
  • SQL Files: SQL scripts for creating tables, views, and performing data transformations.
  • Documentation: Detailed project documentation, data dictionaries, and README files.
  • Tests: Procedures to ensure data quality and integrity post-transformation.

Contributing

Contributions are welcome! If you have suggestions, improvements, or feedback, please submit an issue or a pull request.

About

Highlighting expertise in data migration, data normalization and standardization, this project demonstrates successful data transfer from Snowflake to Databricks. It emphasizes optimized data flow and enhanced accessibility through standardization, showcasing a commitment to ethical data practices.