Use Kafka to retreive broker metadata instead of Zookeeper
cbornet opened this issue · comments
Currently, Nakadi gets the list of Kafka brokers IP/port and the list of topics from Zookeeper. Since Kafka 0.9 the recommended way is to get these info directly from Kafka and not interact with Zookeeper.
Doing it in Nakadi would have the benefit of uncoupling the ZK used by Nakadi for its own configuration data from the one used by Kafka. You could for instance have 2 ZK clusters : 1 for Kafka and 1 for Nakadi with different configuration.
It's also more future proof if Kafka changes its way to use ZooKeeper.
@cbornet hi, thank you for proposal!
The thing is that we change IPs of Kafka brokers sometimes, and Nakadi instances are launching / terminating even more often and in order to get the latest list of brokers we take them from ZK.
Listing topics call is used in tests (I admit it smells) and also to check if the connection to storage was properly created. The listing of topics could be totally removed (as far as I can see ) and check in KafkaTopicRepository could be replaced with e.g. create topic.
About the benefit, it is already possible we use it:
- define default Kafka storage ZK1
- pass ZK2 properties in env vars for Nakadi
So in any case you know you ZK / exhibitor ips, you can provide them for Nakadi.
Let me know what do you think.
I have not see your PR.
So if you really need it, I am for it, but then please keep the old way of getting brokers from ZK.
The thing is that we change IPs of Kafka brokers sometimes
Isn't there the same problem with ZooKeeper's IPs ?
please keep the old way of getting brokers from ZK
Is it OK if we do the following ?
- If property
nakadi.kafka.bootstrap-servers
is not empty, use it to connect to Kafka - If property
nakadi.kafka.bootstrap-servers
is empty, get the Kafka IPs from ZK
Oh, I think I see what you meant. Nakadi refreshes the bootstrap servers list in live Nakadi instances with a scheduled task.
I think it's possible to update the bootstrap servers from a Kafka consumer with
private List<Broker> fetchBrokers() {
return new KafkaConsumer<>(kafkaProperties).listTopics()
.values().stream()
.flatMap(Collection::stream)
.map(PartitionInfo::replicas)
.flatMap(Stream::of)
.distinct()
.map(node -> new Broker(node.host(), node.port()))
.collect(Collectors.toList());
}
WDYT ?
Isn't there the same problem with ZooKeeper's IPs ?
Kafka actually has an issue with caching resolved ZK addresses, that's why we use static ip address for ZK (if you are not familiar with the issue, more here from my colleague https://jobs.zalando.com/tech/blog/rock-solid-kafka/?gh_src=4n3gxh1)
Is it OK if we do the following ?
If property nakadi.kafka.bootstrap-servers is not empty, use it to connect to Kafka
If property nakadi.kafka.bootstrap-servers is empty, get the Kafka IPs from ZK
Yes.
I think it's possible to update the bootstrap servers from a Kafka consumer with
It looks quite strange for me, because you get brokers through the topics, then just get them providing the ZK configuration. So I would leave bootstrap Kafka brokers string as you did it and improve it if required in the next PRs.
Kafka actually has an issue with caching resolved ZK addresses
OK. I think these issues are solved in Kafka 1.1.0 and ZooKeeper 3.4.13
It looks quite strange for me, because you get brokers through the topics
Indeed, it's weird. But for some reason you can't get the list of brokers directly... Also one of the drawbacks with this approach is that we don't get the brokers that don't have any partitions yet.
If property nakadi.kafka.bootstrap-servers is not empty, use it to connect to Kafka
Do you know if there are any problems with using DNS for bootstrap-servers (like there is/was for ZK) ?
OK. I think these issues are solved in Kafka 1.1.0 and ZooKeeper 3.4.13
This is very nice, we have not tested yet, but I also see some comments from Kafka folks apache/zookeeper#451 (comment). Probably we could test it, since we use 1.1.1 in production.
Also we use LB name to resolve ZK addresses, so static IPs of ZK is not a problem, it can be dynamic for the case of figuring out bootstrap servers.
Do you know if there are any problems with using DNS for bootstrap-servers (like there is/was for ZK) ?
I am not aware of it.
I updated the PR accordingly