There are 7 repositories under data-warehousing topic.
Flink Connector for Apache Doris
Cluster manager for Apache Doris
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Example repository showing how to build a data platform with Prefect, dbt and Snowflake
Spark Connector for Apache Doris
Apache Doris Website
This is a top level repository for code examples related to Data Warehousing and Very Large Databases.
Guide to data platforms and tools
Self-managed thirdparty dependencies for Apache Doris
Modeled for longitudinal storage and reporting of P-20W data, the Common Education Data Standards (CEDS) Data Warehouse implements star schema data warehouse normalization techniques for improved query performance.
:scroll: Simple and flexible application to manage configuration data aka lists of values.
Save data from Instagram takeout to a SQLite database
Open-source Twitter collection and archiving tool for tracking specific topics and collecting bulk data.
Zillion Web: A Demo UI and Web API for Zillion
Streaming data pipelines for real-time data warehousing. Includes fully managed connectors (PostgreSQL CDC, Snowflake).
CSC603: Data Warehousing and Mining [DWM] & CSL603: Data Warehousing and Mining Lab [DWM Lab] <Semester VI>
Business Intelligence and Data Warehousing Project
Various Projects on Python related to Data Engineering
A comprehensive educational resource hub dedicated to mastering Microsoft Fabric, offering in-depth tutorials, real-world use cases, and hands-on guides for seamless end-to-end analytics
End-to-end data engineering processes for the NIGERIA Health Facility Registry (HFR). The project leveraged Selenium, Pandas, PySpark, PostgreSQL and Airflow
Programs for various subjects of Computer Engineering
An analytics engineering sandbox focusing on real estates prices in Cook County, IL
Starter project for building an ETL pipeline using SSIS in Visual Studio 2019
A data warehouse and business intelligence project on Stock market dataset to answer non-trivial BI queries.
IBM Data Engineering - Master SQL, RDBMS, ETL, Data Warehousing, NoSQL, Big Data and Spark with hands-on job-ready skills
This repository includes the demos and codes I use to play around with Azure Synapse Anayltics
Sparkify Data Warehouse Project for song play analysis
This repository is a place for the Data Warehousing course at the Information Systems & Analytics department, Santa Clara University.
Chinese hushen stock market analysis project
The schemabase Integration framework to build custom data movers between different cloud services. Using BullMQ, Webhooks, Prisma Database and more
Batch/stream ETL pipeline of NOAA GLM dataset, using Python frameworks: Dagster, PySpark and Parquet storage.