QPeiran / demo_internal

rebase

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

demo_internal

rebase of old project

This is a project for internal use, demostrating the combination use of dbt, snowflake, azure devops.

ScreenShot

Data comes from azure blob storage and/or local and will be pushlished to PowerBI

Pipline triggering by {commit to the main branch}

At this moment I would like to start to build a prototype data pipeline to process {a small amount of batch data}.

According to my plan it should involve

• Snowflake – data warehousing

• DBT – data transformation

• Azure Devops – orchestration & automation

• GitHub -- version control & repo host

• Ideally the data would be ingested from blob storage and be published via PowerBI.

• Trigger periodically.

Potentials todos:

  1. Ingest data from azure blob storage;
  2. Create a role in snowflake that excluding to dbt use;
  3. Ingesting stream data;
  4. Containerize data pipeline (even make it a micro service);
  5. Embed SonarQube into pipeline;
  6. Build an agent pool for data engineering team only (avoid dbt installing every time);
  7. Host the DBT docs from a seperate port (outside of the agent);
  8. Talk to architect and security people about connecting Snowflake and PowerBI (Production env.).
  9. Use a linked service to connect to DBT cloud (instead of installing in my agent)

Question:

  • Use azure devops agent or function app to execute script? (pros / cons)

About

rebase

License:MIT License


Languages

Language:Jupyter Notebook 84.5%Language:HTML 13.2%Language:Python 2.3%