apache-iceberg

There are 3 repositories under apache-iceberg topic.

matano
matanolabs / matano
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
aws cloud security big-data serverless apache-iceberg log-analytics log-management threat-hunting rust alerting cloud-native aws-security cloud-security cybersecurity secops security-tools dfir detection-engineering siem
Language:Rust 1447
apache / incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
apache-hudi apache-iceberg delta-lake
Language:Java 842
cuebook / cuelake
Use SQL to build ELT pipelines on a data lakehouse.
apache-iceberg delta lakehouse datalake data-lake elt etl data-engineering data-integration data-ingestion apache-spark spark-sql upsert incremental-updates data-transfer pipelines data-pipeline zeppelin-notebook sql
Language:JavaScript 285
lhbench / lhbench
Lakehouse storage system benchmark
apache-hudi apache-iceberg lakehouse benchmark cidr database databricks delta-lake
Language:Scala 65
dominikhei / Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
apache-iceberg data-lake data-lakehouse hive-metastore lakehouse minio trino
Language:Dockerfile 51
modern-data-lake-storage-layers
dacort / modern-data-lake-storage-layers
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
aws amazon-emr hudi iceberg apache-hudi apache-iceberg delta-lake
Language:Jupyter Notebook 47
buster-so / buster-platform
The open-source, AI-native data stack
ai analytics apache-iceberg dbms lakehouse starrocks business-intelligence data database warehouse
Language:HCL 32
aws-samples / transactional-datalake-using-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
apache-iceberg aws-glue aws-dms aws-athena apache-spark
Language:Python 24
Bodo-inc / denali
An open-source, community-driven REST catalog for Apache Iceberg!
apache-iceberg catalog go golang iceberg
Language:Go 24
aws-samples / aws-glue-streaming-etl-with-apache-iceberg
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
aws-glue apache-iceberg aws-athena apache-spark aws-glue-streaming
Language:Python 17
tj--- / iceberg-demo
A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino
apache-flink apache-iceberg apache-kafka flink flink-stream-processing iceberg kafka trino gcs java
Language:Java 15
tlepple / iceberg-intro-workshop
Hands-on workshop with Apache Iceberg
apache-iceberg big-data dell dell-object-storage linux minio object-storage pyspark spark spark-sql spark-sql-s3 spark-streaming
Language:Shell 13
aws-samples / monitoring-apache-iceberg-table-metadata-layer
Sample code to collect Apache Iceberg metrics for table monitoring
apache-iceberg aws aws-cloudwatch aws-glue aws-lambda data-quality monitoring sam-cli apache-spark pyiceberg
Language:Python 12
tlepple / data_origination_workshop
Hands-on workshop with Iceberg, Redpanda, Debezium and Kafka-Connect
apache-iceberg debezium debeziumkafkaconnector iceberg kafka-connect minio postgresql pyspark python redpanda redpanda-console spark-streaming
Language:Shell 12
aws-samples / aws-glue-streaming-ingestion-from-kafka-to-apache-iceberg
This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.
aws-s3 apache-iceberg apache-kafka aws-glue-streaming aws-msk aws-msk-serverless pyspark
Language:Python 10
aws-samples / iceberg-streaming-examples
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
apache-iceberg apache-spark structured-streaming
Language:Java 10
YeonwooSung / MLOps
Miscellaneous codes and writings for MLOps
ai ai-as-a-service aws llm llm-inference llm-ops ml-serving mlops multimodal bentoml triton-inference-server apache-iceberg data-intensive-applications docker kubernetes spark spark-nlp rag vector-database vectordb
Language:Jupyter Notebook 10
aws-samples / transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK Serverless and MSK Connect (Debezium)
apache-iceberg aws-glue-streaming aws-msk aws-msk-serverless debezium kafka msk-connect
Language:Python 7
gordonmurray / apache_flink_and_iceberg
Using Apache Flink to write to s3 in Apache Iceberg format
apache-flink apache-iceberg parquet s3
7
lakehouse-poc
fraibacas / lakehouse-poc
Run an open-source data LakeHouse locally using Docker Compose
apache-iceberg apache-superset docker-compose lakehouse prefect
Language:Python 6
davidvanegas2 / iceberg-s3-terraform-glue
Automated setup of Apache Iceberg on Amazon S3 using Terraform and AWS Glue Data Catalog. Explore the power of a Lakehouse architecture for data management and analysis, featuring schema discovery, metadata management, and efficient querying with Amazon Athena.
aws terraform apache-iceberg data-engineering aws-lambda lakehouse
Language:Python 5
aws-samples / automation-of-building-a-transactional-data-lake
apache-hudi apache-iceberg delta-lake transactional-data-lake
Language:Python 4
aws-samples / transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK and MSK Connect (Debezium)
apache-iceberg apache-spark aws-athena aws-glue-streaming debezium kafka-connect mysql
Language:Python 3
JesuFemi-O / iceberg-integration-framework
A poc open framework to manage data ingestion into apache iceberg tables
apache-iceberg lakehouse-platform pyiceberg
Language:Python 2
joewood / react-iceberg
React Components to visualize Apache Iceberg tables
apache-iceberg apache-spark reactjs minio s3 avro apache-arrow devcontainer docker-compose
Language:TypeScript 2
MOBIN-F / iceberg-spark-tpcds-benchmark
iceberg-spark-tpcds-benchmark
apache-iceberg apache-spark iceberg spark
Language:Scala 2
ev2900 / EMR_Studio_Iceberg
Apache Icebery examples designed to be run on AWS Elastic Map Reduce (EMR) via. EMR Studio or EMR Notebooks
apache-iceberg aws elastic-map-reduce emr iceberg
Language:Jupyter Notebook 1
ev2900 / Iceberg_EMR_Athena
Resources from an virtual tech talk / workshop - Set Up and Use Apache Iceberg Tables on Your Data Lake
apache-iceberg athena aws emr spark
Language:Jupyter Notebook 1
j3-signalroom / apache_flink-kickstarter
Examples of Apache Flink® applications showcasing the DataStream API and Table API in Java and Python, featuring AWS, GitHub, Terraform, and Apache Iceberg.
apache-flink flink-examples flink-stream-processing flink flink-kafka apache-iceberg iceberg aws-parameter-store aws-secrets-manager github-actions snowflake terraform-cloud aws-s3
Language:Java 1
jordipuig37 / iceberg-schema-evolution
A tool for learning Iceberg table format
apache-iceberg streamlit
Language:Python 1
masterchief2007 / floeberg
Experiments with Apache Iceberg
apache-iceberg java
Language:Java 1
j3-signalroom / j3-techstack-lexicon
J3's techStack Lexicon.
apache-flink flink apache-iceberg iceberg terraform terraform-cloud
0
j3-signalroom / linux_flink_with_iceberg
Apache Flink Docker image with Apache Iceberg support for Linux (i.e., non-Mac M1, M2, and M3 chips).
apache-flink flink apache-iceberg iceberg
Language:Dockerfile
j3-signalroom / mac_flink_with_iceberg
Apache Flink Docker image with Apache Iceberg support for Mac M1, M2, or M3 chips.
apache-flink apache-iceberg flink iceberg
Language:Dockerfile
johnymontana / hands-on-havasu-geoparquet
Notebook to accompany the "Hands-On With Havasu & GeoParquet" livestream
apache-iceberg apache-sedona geoparquet parquet sedonadb
Language:Jupyter Notebook
THeades / serverless-data-lakehouse
This is an example project how to build a serverless data lakehouse on AWS using Terraform, Apache Iceberg and Spark.
apache-iceberg apache-spark aws data-engineering data-lakehouse terraform

apache-iceberg

matanolabs / matano

apache / incubator-xtable

cuebook / cuelake

lhbench / lhbench

dominikhei / Local-Data-LakeHouse

dacort / modern-data-lake-storage-layers

buster-so / buster-platform

aws-samples / transactional-datalake-using-apache-iceberg-on-aws-glue

Bodo-inc / denali

aws-samples / aws-glue-streaming-etl-with-apache-iceberg

tj--- / iceberg-demo

tlepple / iceberg-intro-workshop

aws-samples / monitoring-apache-iceberg-table-metadata-layer

tlepple / data_origination_workshop

aws-samples / aws-glue-streaming-ingestion-from-kafka-to-apache-iceberg

aws-samples / iceberg-streaming-examples

YeonwooSung / MLOps

aws-samples / transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue

gordonmurray / apache_flink_and_iceberg

fraibacas / lakehouse-poc

davidvanegas2 / iceberg-s3-terraform-glue

aws-samples / automation-of-building-a-transactional-data-lake

aws-samples / transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue

JesuFemi-O / iceberg-integration-framework

joewood / react-iceberg

MOBIN-F / iceberg-spark-tpcds-benchmark

ev2900 / EMR_Studio_Iceberg

ev2900 / Iceberg_EMR_Athena

j3-signalroom / apache_flink-kickstarter

jordipuig37 / iceberg-schema-evolution

masterchief2007 / floeberg

j3-signalroom / j3-techstack-lexicon

j3-signalroom / linux_flink_with_iceberg

j3-signalroom / mac_flink_with_iceberg

johnymontana / hands-on-havasu-geoparquet

THeades / serverless-data-lakehouse