FlinkX

English | 中文

Communication

We are recruiting Big data platform development engineers. If you want more information about the position, please add WeChat ID [ysqwhiletrue] or email your resume to sishu@dtstack.com.
We use DingTalk to communicate, you can search the group number [30537511] or scan the QR code below to join the communication group

Introduction

FlinkX is a distributed offline and real-time data synchronization framework based on flink widely used in 袋鼠云, which realizes efficient data migration between multiple heterogeneous data sources.

Different data sources are abstracted into different Reader plugins, and different data targets are abstracted into different Writer plugins. In theory, the FlinkX framework can support data synchronization of any data source type. As a set of ecosystems, every time a set of new data sources is connected, the newly added data sources can realize intercommunication with existing data sources.

FlinkX is a data synchronization tool based on Flink. FlinkX can collect static data, such as MySQL, HDFS, etc, as well as real-time changing data, such as MySQL binlog, Kafka, etc. FlinkX currently includes the following features:

Most plugins support concurrent reading and writing of data, which can greatly improve the speed of reading and writing;
Some plug-ins support the function of failure recovery, which can restore tasks from the failed location and save running time; Failure Recovery
The Reader plugin for relational databases supports interval polling. It can continuously collect changing data; Interval Polling
Some databases support opening Kerberos security authentication; Kerberos
Limit the reading speed of Reader plugins and reduce the impact on business databases;
Save the dirty data when writing data;
Limit the maximum number of dirty data;
Multiple running modes: Local,Standalone,Yarn Session,Yarn Per;

The following databases are currently supported:

	Database Type	Reader	Writer
Batch Synchronization	MySQL	doc	doc
	Oracle	doc	doc
	SqlServer	doc	doc
	PostgreSQL	doc	doc
	DB2	doc	doc
	GBase	doc	doc
	ClickHouse	doc	doc
	PolarDB	doc	doc
	SAP Hana	doc	doc
	Teradata	doc	doc
	Phoenix	doc	doc
	达梦	doc	doc
	Greenplum	doc	doc
	KingBase	doc	doc
	Cassandra	doc	doc
	ODPS	doc	doc
	HBase	doc	doc
	MongoDB	doc	doc
	Kudu	doc	doc
	ElasticSearch	doc	doc
	FTP	doc	doc
	HDFS	doc	doc
	Carbondata	doc	doc
	Stream	doc	doc
	Redis		doc
	Hive		doc
Stream Synchronization	Kafka	doc	doc
	EMQX	doc	doc
	RestApi		doc
	MySQL Binlog	doc
	MongoDB Oplog	doc
	PostgreSQL WAL	doc
	Oracle LogMiner	doc
	Sqlserver CDC	doc

Fundamental

In the underlying implementation, FlinkX relies on Flink, and the data synchronization task will be translated into StreamGraph and executed on Flink. The basic principle is as follows: