There are 3 repositories under apache-iceberg topic.
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
The open-source, AI-native data stack
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino
Hands-on workshop with Apache Iceberg
Sample code to collect Apache Iceberg metrics for table monitoring
Hands-on workshop with Iceberg, Redpanda, Debezium and Kafka-Connect
This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
Miscellaneous codes and writings for MLOps
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK Serverless and MSK Connect (Debezium)
Using Apache Flink to write to s3 in Apache Iceberg format
Run an open-source data LakeHouse locally using Docker Compose
Automated setup of Apache Iceberg on Amazon S3 using Terraform and AWS Glue Data Catalog. Explore the power of a Lakehouse architecture for data management and analysis, featuring schema discovery, metadata management, and efficient querying with Amazon Athena.
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK and MSK Connect (Debezium)
A poc open framework to manage data ingestion into apache iceberg tables
React Components to visualize Apache Iceberg tables
iceberg-spark-tpcds-benchmark
Apache Icebery examples designed to be run on AWS Elastic Map Reduce (EMR) via. EMR Studio or EMR Notebooks
Resources from an virtual tech talk / workshop - Set Up and Use Apache Iceberg Tables on Your Data Lake
Examples of Apache FlinkĀ® applications showcasing the DataStream API and Table API in Java and Python, featuring AWS, GitHub, Terraform, and Apache Iceberg.
A tool for learning Iceberg table format
J3's techStack Lexicon.
Apache Flink Docker image with Apache Iceberg support for Linux (i.e., non-Mac M1, M2, and M3 chips).
Apache Flink Docker image with Apache Iceberg support for Mac M1, M2, or M3 chips.
Notebook to accompany the "Hands-On With Havasu & GeoParquet" livestream
This is an example project how to build a serverless data lakehouse on AWS using Terraform, Apache Iceberg and Spark.