wanghy8166 / flinkx

基于flink的分布式数据同步工具

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FlinkX

License

English | 中文

Communication

  • We are recruiting Big data platform development engineers. If you want more information about the position, please add WeChat ID [ysqwhiletrue] or email your resume to sishu@dtstack.com.

  • We use DingTalk to communicate, you can search the group number [30537511] or scan the QR code below to join the communication group

Introduction

  • FlinkX is a distributed offline and real-time data synchronization framework based on flink widely used in 袋鼠云, which realizes efficient data migration between multiple heterogeneous data sources.

Different data sources are abstracted into different Reader plugins, and different data targets are abstracted into different Writer plugins. In theory, the FlinkX framework can support data synchronization of any data source type. As a set of ecosystems, every time a set of new data sources is connected, the newly added data sources can realize intercommunication with existing data sources.

FlinkX is a data synchronization tool based on Flink. FlinkX can collect static data, such as MySQL, HDFS, etc, as well as real-time changing data, such as MySQL binlog, Kafka, etc. FlinkX currently includes the following features:

  • Most plugins support concurrent reading and writing of data, which can greatly improve the speed of reading and writing;

  • Some plug-ins support the function of failure recovery, which can restore tasks from the failed location and save running time; Failure Recovery

  • The Reader plugin for relational databases supports interval polling. It can continuously collect changing data; Interval Polling

  • Some databases support opening Kerberos security authentication; Kerberos

  • Limit the reading speed of Reader plugins and reduce the impact on business databases;

  • Save the dirty data when writing data;

  • Limit the maximum number of dirty data;

  • Multiple running modes: Local,Standalone,Yarn Session,Yarn Per;

The following databases are currently supported:

Database Type Reader Writer
Batch Synchronization MySQL doc doc
Oracle doc doc
SqlServer doc doc
PostgreSQL doc doc
DB2 doc doc
GBase doc doc
ClickHouse doc doc
PolarDB doc doc
SAP Hana doc doc
Teradata doc doc
Phoenix doc doc
达梦 doc doc
Greenplum doc doc
KingBase doc doc
Cassandra doc doc
ODPS doc doc
HBase doc doc
MongoDB doc doc
Kudu doc doc
ElasticSearch doc doc
FTP doc doc
HDFS doc doc
Carbondata doc doc
Stream doc doc
Redis doc
Hive doc
Stream Synchronization Kafka doc doc
EMQX doc doc
RestApi doc
MySQL Binlog doc
MongoDB Oplog doc
PostgreSQL WAL doc
Oracle LogMiner doc
Sqlserver CDC doc

Fundamental

In the underlying implementation, FlinkX relies on Flink, and the data synchronization task will be translated into StreamGraph and executed on Flink. The basic principle is as follows:

Quick Start

Please click Quick Start

General Configuration

Please click General Configuration

Statistics Metric

Please click Statistics Metric

Kerberos

Please click Kerberos

Questions

Please click Questions

How to contribute FlinkX

Please click Contribution

License

FlinkX is under the Apache 2.0 license. See the LICENSE file for details.

About

基于flink的分布式数据同步工具

License:Apache License 2.0


Languages

Language:Java 99.6%Language:Logos 0.3%Language:Shell 0.1%Language:RPC 0.0%Language:Batchfile 0.0%