There are 55 repositories under data-warehouse topic.
A curated list of awesome big data frameworks, ressources and other awesomeness.
Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI.
The data warehouse for operational workloads.
Privacy and Security focused Segment-alternative, in Golang and React
Hydra: Column-oriented Postgres. Add scalable analytics to your project in minutes.
2019新型冠状病毒疫情时间序列数据仓库 | COVID-19/2019-nCoV Infection Time Series Data Warehouse
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
[NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis
TensorBase is a new big data warehousing with modern efforts.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.
Scratch is a swiss army knife for big data.
Personal Data Engineering Projects
🔥 Open Source Reverse ETL and Customer Data Platform (CDP). An open-source alternative to Hightouch, Census, and RudderStack.
Data API Framework for AI Agents and Data Apps
Supercharge BigQuery with BigFunctions
DomainMOD is an open source application written in PHP & MySQL used to manage your domains and other internet assets in a central location. DomainMOD also includes a Data Warehouse framework that allows you to import your web server data so that you can view, export, and report on your live data.
One framework to develop, deploy and operate data workflows with Python and SQL.
Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for the Snowflake data warehouse.
Construct a modern data stack and orchestration the workflows to create high quality data for analytics and ML applications.
Configurable Extract, Transform, and Load
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Cloudberry Database - Next generation unified database for Analytics and AI
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
Airbyte made simple (no UI, no database, no cluster)
This is a template you can use for your next data engineering portfolio project.
ETL with Python - Taught at DWH course 2017 (TAU)
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Turn your APM data store into a Data Warehouse with advanced reporting, including entities, configuration, metrics, flowmaps, events, snapshots and call graph flame graphs