uReplicator provides the ability to replicate across Kafka clusters in other data centers. Instead of publishing to a single Kafka cluster, you can publish data to multiple regional Kafka clusters and aggregate it all in one Kafka cluster.
=========
Kafka's current (part of 0.8.2) MirrorMaker design consumes data from a given regional Kafka cluster using a Kafka high-level consumer. With this design, rebalancing in the high level consumer (due to a addition/deletion of topics, source cluster problems, network issues and so on) affects all the topics being replicated via that Mirrormaker.
- Stability: Rebalance only occurs during startup (when a node is added/deleted)
- Simple operations: Easy to scale up cluster, no server restart for whitelisting topics
- High throughput: Max offset lag is consistently 0.
- Time SLA (~5min)
Check out the uReplicator project:
git clone git@github.com:uber/uReplicator.git
cd uReplicator
This project contains everything (both mirrormaker-controller and mirrormaker-worker) you’ll need to run uReplicator.
Before you can run uReplicator, you need to build a package for it. This package is what your deployment tool uses to deploy uReplicator.
mvn clean package
Or command below (the previous one will take a long time to run):
mvn clean package -DskipTests
To test uReplicator locally, you need two systems: Kafka, and ZooKeeper. The script “grid” is to help you set up these systems.
- Modify permission for the scripts generated by Maven:
chmod u+x bin/pkg/*.sh
- The command below will download, install, and start ZooKeeper and Kafka (will start two Kafka systems: kafka1, which we use as source Kafka cluster, and kafka2, which we use as destination Kafka cluster):
bin/grid bootstrap
- Create a dummyTopic in kafka1 and produce some dummy data:
./bin/produce-data-to-kafka-topic-dummyTopic.sh
- Check if the data is successfully produced to kafka1 by opening another console tab and executing the command below:
./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster1 --topic dummyTopic
- You should get this data:
Kafka topic dummy topic data 1
Kafka topic dummy topic data 2
Kafka topic dummy topic data 3
Kafka topic dummy topic data 4
…
Example 1: Copy data from source cluster to destination cluster
- Start uReplicator Controller (you should keep it running):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-controller-example1.sh
- Start uReplicator Worker (you should keep it running, and it’s normal if you see kafka.consumer.ConsumerTimeoutException at this moment, since no topic has been added for copying):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-worker-example1.sh
- Add topic to uReplicator Controller to start copying from kafka1 to kafka2:
curl -X POST -d '{"topic":"dummyTopic", "numPartitions":"1"}' http://localhost:9000/topics
- To check if the data is successfully copied to kafka2, you should open another console tab and execute the command below:
./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster2 --topic dummyTopic1
- And you will see the same messages produced in kafka1:
Kafka topic dummy topic data 1
Kafka topic dummy topic data 2
Kafka topic dummy topic data 3
Kafka topic dummy topic data 4
…
Example 2: Copy data from source Kafka to destination Kafka cluster without explicitly whitelisting topics
- Start uReplicator Controller (you should keep it running):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-controller-example2.sh
- Start uReplicator Worker (you should keep it running, and it’s normal if you see kafka.consumer.ConsumerTimeoutException at this moment since no topic has been added for copying):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-worker-example2.sh
- Create topic in kafka2. Example 2 enables topic auto-whitelisting, so you don't need to whitelist topics manually. If a topic is in both source and destination Kafka clusters, the controller auto-whitelists the topic and starts copying data.
./deploy/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181/cluster2 --topic dummyTopic --partition 1 --replication-factor 1
- To check if the data is successfully copied to kafka2, open another console tab and execute this command (you might need to wait about 20 seconds for controller to refresh):
./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster2 --topic dummyTopic
- And you should see the same messages produced in kafka1:
Kafka topic dummy topic data 1
Kafka topic dummy topic data 2
Kafka topic dummy topic data 3
Kafka topic dummy topic data 4
…
When you’re done, you can clean everything up using the same grid script:
./bin/pkg/stop-all.sh
Congratulations! You’ve now set up a local grid that includes Kafka and ZooKeeper, and you've run a uReplicator worker on it.