josdotso / uReplicator

Improvement of Apache Kafka Mirrormaker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

uReplicator

Build Status

uReplicator provides the ability to replicate across Kafka clusters in other data centers. Instead of publishing to a single Kafka cluster, you can publish data to multiple regional Kafka clusters and aggregate it all in one Kafka cluster.

=========

Kafka's current (part of 0.8.2) MirrorMaker design consumes data from a given regional Kafka cluster using a Kafka high-level consumer. With this design, rebalancing in the high level consumer (due to a addition/deletion of topics, source cluster problems, network issues and so on) affects all the topics being replicated via that Mirrormaker.

Goals of uReplicator

  1. Stability: Rebalance only occurs during startup (when a node is added/deleted)
  2. Simple operations: Easy to scale up cluster, no server restart for whitelisting topics
  3. High throughput: Max offset lag is consistently 0.
  4. Time SLA (~5min)

uReplicator Quick Start

Get the Code

Check out the uReplicator project:

git clone git@github.com:uber/uReplicator.git
cd uReplicator

This project contains everything (both mirrormaker-controller and mirrormaker-worker) you’ll need to run uReplicator.

Build uReplicator

Before you can run uReplicator, you need to build a package for it. This package is what your deployment tool uses to deploy uReplicator.

mvn clean package

Or command below (the previous one will take a long time to run):

mvn clean package -DskipTests

Set Up Local Test Environment

To test uReplicator locally, you need two systems: Kafka, and ZooKeeper. The script “grid” is to help you set up these systems.

  • Modify permission for the scripts generated by Maven:
chmod u+x bin/pkg/*.sh
  • The command below will download, install, and start ZooKeeper and Kafka (will start two Kafka systems: kafka1, which we use as source Kafka cluster, and kafka2, which we use as destination Kafka cluster):
bin/grid bootstrap
  • Create a dummyTopic in kafka1 and produce some dummy data:
./bin/produce-data-to-kafka-topic-dummyTopic.sh
  • Check if the data is successfully produced to kafka1 by opening another console tab and executing the command below:
./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster1 --topic dummyTopic
  • You should get this data:
Kafka topic dummy topic data 1
Kafka topic dummy topic data 2
Kafka topic dummy topic data 3
Kafka topic dummy topic data 4
…

Start uReplicator

Example 1: Copy data from source cluster to destination cluster

  • Start uReplicator Controller (you should keep it running):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-controller-example1.sh
  • Start uReplicator Worker (you should keep it running, and it’s normal if you see kafka.consumer.ConsumerTimeoutException at this moment, since no topic has been added for copying):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-worker-example1.sh
  • Add topic to uReplicator Controller to start copying from kafka1 to kafka2:
curl -X POST -d '{"topic":"dummyTopic", "numPartitions":"1"}' http://localhost:9000/topics
  • To check if the data is successfully copied to kafka2, you should open another console tab and execute the command below:
./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster2 --topic dummyTopic1
  • And you will see the same messages produced in kafka1:
Kafka topic dummy topic data 1
Kafka topic dummy topic data 2
Kafka topic dummy topic data 3
Kafka topic dummy topic data 4
…

Example 2: Copy data from source Kafka to destination Kafka cluster without explicitly whitelisting topics

  • Start uReplicator Controller (you should keep it running):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-controller-example2.sh
  • Start uReplicator Worker (you should keep it running, and it’s normal if you see kafka.consumer.ConsumerTimeoutException at this moment since no topic has been added for copying):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-worker-example2.sh
  • Create topic in kafka2. Example 2 enables topic auto-whitelisting, so you don't need to whitelist topics manually. If a topic is in both source and destination Kafka clusters, the controller auto-whitelists the topic and starts copying data.
./deploy/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181/cluster2 --topic dummyTopic --partition 1 --replication-factor 1
  • To check if the data is successfully copied to kafka2, open another console tab and execute this command (you might need to wait about 20 seconds for controller to refresh):
./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster2 --topic dummyTopic
  • And you should see the same messages produced in kafka1:
Kafka topic dummy topic data 1
Kafka topic dummy topic data 2
Kafka topic dummy topic data 3
Kafka topic dummy topic data 4
…

Shutdown

When you’re done, you can clean everything up using the same grid script:

./bin/pkg/stop-all.sh

Congratulations! You’ve now set up a local grid that includes Kafka and ZooKeeper, and you've run a uReplicator worker on it.

About

Improvement of Apache Kafka Mirrormaker

License:Apache License 2.0


Languages

Language:Java 76.0%Language:Scala 20.5%Language:Shell 3.5%