There are 1 repository under data-lakehouse topic.
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
A curated list of open source tools used in analytical stacks and data engineering ecosystem
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
DatAasee - A Metadata-Lake for Libraries
My M.Sc. dissertation: Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).
🦖 Efficiently evolve your old fixed-length data files into more modern file formats, fully parallelized!
The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.
This repository is a place for the Data Warehousing course at the Information Systems & Analytics department, Santa Clara University.
STEDI project
Data lakehouse at home with docker compose
This project is aimed at overhauling a university's data infrastructure to improve efficiency, security, and scalability, resulting in the successful creation of a unified data management solution.
#Test - Create a Data Lakehouse in Kubernetes
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
This project implements an end-to-end techstack for a data platform, for local development.
Всё что нужно знать про DuckDB
Инфраструктура для data engineer S3
This is an example project how to build a serverless data lakehouse on AWS using Terraform, Apache Iceberg and Spark.