Blizzard / node-rdkafka

Node.js bindings for librdkafka

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Manual offset and partition assignment?

ottomata opened this issue · comments

This might be a feature request, and it might not be possible.

My use case needs 'simple consumer' like behavior. I don't want Kafka to manage consumer group rebalancing, and I won't be doing any offset commits. I do want consumers to be able to provide topic, partition, and offset from which they will begin consuming, so that clients can manage their own offsets. When the protocol eventually supports consuming based on event timestamps, I'd like to support that as well.

Is this possible? I see KafkaConsumer.prototype.assign is a function, but I don't see anything using it, and my attempts to do so leave me in Erroneous state from librdkafka. I think this is either because the offsets aren't actually passed to librdkafka's assign, or because of some internal subscribed state that is not being updated.

Is doing something like consumer.assign([ { topic: 'test', partition: 0, offset: 1102583 } ]) possible?

Also, if can I disable auto consumer group rebalancing? I'm looking for a way to override the rebalance_cb, but I don't see that.

assign should be the way to do this. If it isn't working it's a bug. I haven't tested it much because we don't use it.

librdkafka also has a Simple Consumer, however, that does not have managed offsets and partitions. I planned on eventually implementing this if there was a desire for it (this is why the property is Kafka.KafkaConsumer and not Kafka.Consumer).

In any event, it's likely that the internal subscribed state is not being updated, as you said. I'll take a look at this and get back to you :)

Can you try the code in this branch to see if it fixes your problem:

https://github.com/Blizzard/node-rdkafka/tree/assignment-fix

Should also now allow offsets to be passed into the object. PR #20

It works! Thank you!

On Tue, Aug 23, 2016 at 1:43 PM, Stephen Parente notifications@github.com
wrote:

Can you try the code in this branch to see if it fixes your problem:

https://github.com/Blizzard/node-rdkafka/tree/assignment-fix

Should also now allow offsets to be passed into the object.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#18 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABjPxnTX8_JYpoFMd9z9XbeeDHYkOoU1ks5qizE-gaJpZM4JrH28
.

As for the other question, I think it might be possible to keep Kafka from managing any state if I could override the rebalance_cb. Would this be doable as well?

Got too antsy :P

Currently, the rebalance_cb is on by default and can't be turned off. However, I'm in the process of writing it so you can opt-out.

However, opting out of the rebalance CB will probably just make librdkafka manage the state internally.

c.f. https://github.com/edenhill/librdkafka/blob/master/src-cpp/rdkafkacpp.h#L541

How do you see overriding the rebalance_cb in order to prevent it from doing work working? Perhaps there is an option that makes you do the corresponding rebalance work in node-land? e.g.

void Rebalance::rebalance_cb(RdKafka::KafkaConsumer *consumer,
  RdKafka::ErrorCode err,
  std::vector<RdKafka::TopicPartition*> &partitions) {
  if (that_->HasRebalanceOverride()) {
     // Do nothing. Event will get emitted and user has to handle this
  } else if (err == RdKafka::ERR__ASSIGN_PARTITIONS) {
    that_->Assign(partitions);
  } else {
    that_->Unassign();
  }

  dispatcher.Add(rebalance_event_t(err, partitions));
  dispatcher.Execute();

  eof_cnt = 0;
}

Then you can...

consumer.on('rebalance', (code, topicPartitions) => {
   consumer.assign(topicPartitions);
});

Perhaps I should just do that by default in node land. Now that assignment is working I can probably refactor this so node.js handles the rebalances, and librdkafka just reports them.

What do you think?

Hm, seems to make sense to me!

Ultimately, I am trying to avoid having the Kafka brokers store any state about clients. I'm building a Kafka consumer -> socket.io bridge, so that we can expose some data streams publicly. Since this will be on the internet, I want to push client state management 100% out to the clients. I don't want the internet to have any affect on the state stored in my Kafka cluster.

As is, I believe the high level consumer doesn't do this for me all the way, is that correct? Since group.id is required, Kafka is going to know about the existence of my consumer.

Re. the consumer rebalance cb in node, I'm pretty sure this is how the confluent rdkafka python client works: https://github.com/edenhill/librdkafka/blob/master/src-cpp/rdkafkacpp.h#L541-L543

I think in that use case, using the low level consumer is probably the best way to go. Unfortunately I don't have any of that implemented, but would be happy to accept a PR! Kafka.Consumer uses util.deprecate because of plans to eventually implement it. It's pretty far on my priority list at this point to work on low level consumer, however.

But besides that, I think putting the rebalance callback power in node's hands is a good idea when I want the library to strive to be pretty close to a 1:1 port (at the bindings level). I'll probably change it so rebalance_cb is off by default, and if you opt in you need to do the rebalance logic yourself (as you would if you opted into partitioner_cb (c.f. https://github.com/Blizzard/node-rdkafka/blob/master/src/callbacks.cc#L481)

As for whether bypassing assignments in this way will make Kafka not store the assigned states for its consumer groups internally, I'm not sure. I wouldn't know without actually testing an implementation like that and checking what Kafka said.

Great, thanks. I'll ask Magnus what he thinks about this too. I can wait for a new librdkafka simple consumer implementation if he is planning on making one.

@ottomata That's what you're looking for: confluentinc/librdkafka#593 I guess

Indeed, thanks for that reminder!

Merged #31 which I believe addresses at least part of this problem.

Closing this issue, as I think #31 will fix it! Please reopen if you are still experiencing problems