There are 24 repositories under datawarehouse topic.
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
Postgres-native columnar storage extension
Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks.
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
An open-source columnar data format designed for fast & realtime analytic with big data.
Free and open source schema versioning and database migration made natively with .NET/6. NEW THIS MAY 2022! v1.3.15 released!
A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
High-performance time-series database. 2.4M metrics/sec + 950K logs/sec + 940K traces/sec + 940K events/sec. One endpoint, one protocol. DuckDB + Parquet + Arrow. AGPL-3.0
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
Hydra九头龙,面向PB级别知识库取数、情报系统、数据平台、大规模控制调度系统。面向大规模数据采集、分析、智能取数。——以实现大规模分布式爬虫搜索引擎为例。
Make dbt great again! Extend dbt with plugins, local docs and custom adapters — fast, safe, and developer-friendly
Roadmap for Data Engineering
Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
OpenChatBI is an intelligent chat-based BI tool powered by large language models, designed to help users query, analyze, and visualize data through natural language conversations. It uses LangGraph and LangChain to build chat agent and workflows that support natural language to SQL conversion and data analysis.
Accelerator to build a Microsoft Fabric modern data platform using pre-built reusable Fabric items and an orchestration ELT Framework
All of my individual learning materials, documents, and notes from the process of getting the Coursera IBM Data Engineer Professional Certificate specialization are stored in this repository.
A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.
implementing an end-to-end tweets ETL/Analysis pipeline.
End to end data engineering project
AlphaSQL provides Integrated Type and Schema Check and Parallelization for SQL file set mainly for BigQuery
Python package for managing OHDSI clinical data models. Includes support for LLM based plain text queries, MCP server and FHIR import.
A library for data warehouse and data integration pattern and architecture documentation.
The Virtual Data Warehouse is a code generation and template management tool. It is part of the data solution automation ecosystem - the 'engine' for data solution automation.
Data warehouse for CouchDB
Repo for Data Warehouse Concepts, Design, and Data Integration by University of Colorado System (coursera)(Notes,Assignments, quiz and research papers)
Generic interface exchange format for Data Warehouse Automation and ETL generation.
A DuckDB-powered command line interface for Snowflake security, governance, operations, and cost optimization.
Awesome list for datapipeline
后端学习笔记,本项目存放了一些我阅读有关的技术类的书籍和部分源码阅读的笔记整理。 涉及范围包括后端开发中的计算机学科基础知识、高级语言的基础知识、源码阅读笔记、数据库知识、数据挖掘知识等,同时也会涉及到一些具体生产场景中会遇到的一些实际问题。 :-D
Code/Notes for the Data Engineering Zoomcamp by DataTalksClub