Xiaojian Sun's repositories
airbyte
Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.
airbyte-platform
The platform that powers Airbyte. Please file issues in https://github.com/airbytehq/airbyte
debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
flink
Apache Flink
paimon
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
seatunnel
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
amoro
Arctic is a streaming lake warehouse service open sourced by NetEase
datafusion
Apache DataFusion SQL Query Engine
DB-GPT
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
elasticsearch
Free and Open, Distributed, RESTful Search Engine
flink-cdc
CDC Connectors for Apache Flink®
graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
gravitino
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
hadoop
Apache Hadoop
helm-java
Helm client for Java
iceberg
Apache Iceberg
kafka-connect-file-pulse
🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka
kafka-connect-paimon
kafka connect for paimon
llamacoder
Open source Claude Artifacts – built with Llama 3.1 405B
llm-action
本项目旨在分享大模型相关技术原理以及实战经验。
migration
Migration tools for TiKV, e.g. online bulk load.
paimon-webui
Web ui for Apache Paimon.
parquet-mr
Apache Parquet
pinot
Apache Pinot - A realtime distributed OLAP datastore
polardbx-sql
PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.
pravega
Pravega - Streaming as a new software defined storage primitive
ranger
Apache Ranger - To enable, monitor and manage comprehensive data security across the Hadoop platform and beyond
risingwave
Scalable Postgres for stream processing, analytics, and management. KsqlDB and Apache Flink alternative. 🚀 10x more productive. 🚀 10x more cost-efficient.
starrocks
StarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.
temporal
Temporal service