TrackingSoft / Kafka

Perl implementation of Kafka API (official CPAN module)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

When consuming, how to let kafka choose the partitions?

veqryn opened this issue · comments

This library appears to force you to specify which partition you are consuming from when you want to fetch messages.

Kafka is supposed to do that for you.
Kafka looks at how many partitions there are in a topic, and how many consumers are currently in the consumer group, and then rebalancing the number of partitions that are being sent to each consumer.
So if you have 4 partitions, and only 2 consumers, then kafka will send messages from 2 partitions to each consumer as they request messages, without the consumers needing to know about the partitions when they fetch messages. If 2 more consumers get added, kafka will rebalance automatically so that each consumer gets its own partition, that way two consumers within a group are never using the same partition.

How do I do this with this library?

Duplicate of #2.

This module does not implement high-level consumer API including consumer groups. Initially, the issue was with Zookeeper, because for Kafka versions < 0.9 the high-level API required working with Zookeeper directly and in Perl it is kind of tricky. Recent versions of Kafka (0.10+ I believe) do not require Zookeeper and all relevant data is available via broker connection. However there is no plans to implement this yet.

As a side note, our experience shows that such high-level partitioning logic may work better if implemented on application level instead of relying on some Kafka "magic". YMMV.

That is unfortunate.
Expecting users to implement their own distributed partition locking is a big ask, especially when it comes with Kafka and all other languages I've used have libraries with balanced consumers.

@veqryn @asolovey was a consumer group implemented in perl?

This module does not support consumer groups. Unfortunately, there is no plans to work on this feature for now.