There are 6 repositories under data-infrastructure topic.
Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, DuckDB, Pandas, and Plotly, Matplotlib, etc. Build dashboards, reports, and notebooks that run offline, load fast, and share like a document.
Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it!
TensorBase is a new big data warehousing with modern efforts.
A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues
A battle-tested, flexible & comprehensive monitoring solution for your PostgreSQL databases
The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย
Build & Learn Data Engineering,Machine Learning over Kubernetes. No Shortcut approach.
Kanadi is a Nakadi client for Scala
OpenSnowcat Enricher (Apache 2.0 License)
A generic data pipeline which will map Elasticsearch documents to Bigquery table rows
Information relating to topics on Data Engineering, Data Infrastructure, Data Storing, Data Warehouses and Business Analysis. For those interested in both conceptual theory and use case examples for database design and development.
Service for sharing user consent to cookies across multiple domains
Collections of POC/dev data infrastructure. | #SE
Bring Infrastructure as Code best practices to your data workflows with Kestra and Terraform
A fake GOV.UK homepage and start pages for SDE prototype services
MPDD Calculator for Atomistic Line Graph Neural Network Deployment
TP d'architecture décisionnel à destination des étudiants de l'EPSI et DC Paris. Le but est de déployer une architecture data dès la récupération de la donnée vers la restitution sous la forme de dataviz en passant par un Datalake, Data Warehouse et d'un Data Mart
A practical data mesh reference implementation, powered by open-source.
Export Google Analytics (GA4 and UA) settings
SDE prototype dummy service - Hexagrams as a Service
Processing code for Scientific Data Descriptor paper
CLI tool for automatic data platform deployment
Инфраструктура для Data-Engineer DBT
An AWS-based data pipeline to extract, process, store, and monitor Yelp "health-related" facility data in support of ongoing health system initiatives.
skeleton streaming data platform on gcp...
The purpose of this repository is to create a data infrastructure that will communicate with the STEMNET server at the University of Alabama Huntsville. In particular, the goal is to give anyone the capability to create clean daily files from all available stations on linux machines.
Fork of Zalando Postgres Splio with pgvecto.rs and VectorChord extensions installed (Immich-compatible)
An opinionated template for Data Packages built with Seedcase packages.