Min Zhao's repositories
incubator-iceberg
Apache Iceberg (Incubating)
airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
arctic
Arctic is a streaming lake warehouse service open sourced by NetEase
flink
Apache Flink
flink-cdc-connectors
Change Data Capture (CDC) Connectors for Apache Flink
bitsail
BitSail is a distributed, high-performance data integration engine and provides global data integration solutions in batch, streaming, and incremental scenarios. At present, BitSail has been widely used and synchronizes hundreds of trillions data every day.
blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
DataSphereStudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
dolphinscheduler
Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.
elasticsearch-hadoop
:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
gravitino
A high-performance, geo-distributed and federated metadata lake
incubator-doris
Apache Doris (Incubating)
incubator-kyuubi-website
Apache Kyuubi Site
incubator-linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
incubator-livy
Mirror of Apache livy (Incubating)
incubator-paimon
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
incubator-seatunnel
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
kyuubi-client
Client libraries of end users of Apache Kyuubi
OpenLineage
An Open Standard for lineage metadata collection
spark
Apache Spark - A unified analytics engine for large-scale data processing
spark-clickhouse-connector
Spark ClickHouse Connector build on DataSourceV2 API and gRPC protocol.
spark-distcp
A re-implementation of Hadoop DistCP in Apache Spark
spark-sql-dsv2-extension
A sql extension build on spark3 datasource v2 api, ex: hive v2 catalog support amoung multi clusters