kafka-connect-cluster

A docker image for setting up Kafka Connect clusters.

This part of fast-data-dev is targeted to more advanced users and is a special case since it doesn't set-up a Kafka cluster, instead it expects to find a Kafka Cluster with Schema Registry up and running.

The developer can then use this docker image to setup a connect-distributed cluster by just spawning a couple containers.

docker run -d --net=host \
       -e ID=01 \
       -e BS=broker1:9092,broker2:9092 \
       -e ZK=zk1:2181,zk2:2181 \
       -e SR=http://schema-registry:8081 \
       -e HOST=<IP OR FQDN>
       landoop/fast-data-dev-connect-cluster

For an example take a look in the docker-compose.yml. It will spawn a fast-data-dev to act as the Kafka stack, 3 fast-data-dev-connect-cluster containers to form a Connect cluster and a Connect UI (at port 8000) for the cluster. Remember that Connect needs some time to populate the connectors, so you may have to wait a few minutes before they show up in the UI (when you press new). Also you can use fast-data-dev's 3030 port to inspect your schemas and topics.

For now this image is tied to landoop/fast-data-dev:latest, which is on CP3.1.2. In the future we may offer more versions.

Things to look out for in configuration options:

It is important to give a full URL (including schema —http://) for schema registry.
ID should be unique to the Connect cluster you setup, for current and old instances. This is because Connect stores data in Brokers and Schema Registry. Thus even if you destroyed a Connect cluster, its data remain in your Kafka setup.
HOST should be set to an IP address or domain name that other connect instances and clients can use to reach the current instance. We chose not to try to autodetect this IP because such a feat would fail more often than not. Good choices are your local network ip (e.g 10.240.0.2) if you work inside a local network, your public ip (if you have one and want to use it) or a domain name that is resolvable by all the hosts you will use to talk to Connect.

If you don't want to run with --net=host you have to expose Connect's port which at default settings is 8083. There a PORT option, that allows you to set Connect's port explicitly if you can't use the default 8083. Please remember that it is important to expose Connect's port on the same port at the host. This is a choice we had to make for simplicity's sake.

docker run -d \
       -e ID=01 \
       -e BS=broker1:9092,broker2:9092 \
       -e ZK=zk1:2181,zk2:2181 \
       -e SR=http://schema-registry:8081 \
       -e HOST=<IP OR FQDN>
       -e PORT=8085
       -p 8085:8085
       landoop/fast-data-dev-connect-cluster

Advanced issues

The container does not exit with CTRL+C. This is because we chose to pass control directly to Connect, so you check your logs via docker logs. You can stop it or kill it from another terminal.

Whilst the PORT variable sets the rest.port, the HOST variable sets the advertised host. This is the hostname that Connect will send to other Connect instances. By default Connect listens to all interfaces, so you don't have to worry as long as other instances can reach each instance via the advertised host.

ptzagk / fast-data-connect-cluster

kafka-connect-cluster

Advanced issues

About

Languages