Fetch offset repeating from some random previously committed offset after reaching end of messages, in loop.

Question

Fetch offset repeating from some random previously committed offset after reaching end of messages, in loop.

PrasadBallingam opened this issue 8 years ago · comments

I have been hitting this problem intermittently. I have 40 msgs in a topic (1 partition), reading with Consumer Group, ZookeeperConnectorCount =1, CommitBatchSize = 2 (3, 5, 17, 25 etc didn't make diff w.r.t this issue).

From the filtered log snippet below, you can see that Message keeps going more than 40 and Offset value resets to some value already seen (Offset:4 below), and this continues. When I reach end of offset, again it will be reset to some other value - this repeats indefinitely.

If I kill the consumer and run again at later time, it will work just fine - will read till last offset and waits for new msgs.

Have you seen this behavior? Are there any settings that can control this? I am blocked by this, appreciate ANY help.
Repro_Detail_log.txt

1922:2016-05-17 01:05:56,781 [3] INFO MDM.MessageBrokerManager.ConsumerGroupHelperUnit [(null)] - Message 39 from Partition:0, Offset:38, key:(null), value:{
1971:2016-05-17 01:05:56,996 [3] INFO MDM.MessageBrokerManager.ConsumerGroupHelperUnit [(null)] - Message 40 from Partition:0, Offset:39, key:(null), value:{
2020:2016-05-17 01:05:57,162 [3] INFO MDM.MessageBrokerManager.ConsumerGroupHelperUnit [(null)] - Message 41 from Partition:0, Offset:4, key:(null), value:{
2067:2016-05-17 01:05:57,349 [3] INFO MDM.MessageBrokerManager.ConsumerGroupHelperUnit [(null)] - Message 42 from Partition:0, Offset:5, key:(null), value:{

Yixuan Liu · Answer 1 · Wed May 18 2016 06:11:58 GMT+0800 (China Standard Time)

Take a glance look at your logs, first I noticed that the offset was never been committed. Second, I would suggest you to check MessageBrokerManager.ConsumerGroupHelperUnit to see why it didn't detect noMoreMessage.

Prasad Ballingam · Answer 2 · Wed May 18 2016 07:47:31 GMT+0800 (China Standard Time)

Thanks for quick reply. This does help my issue investigation.

I can see offset being written and increasing past 40 msg count, as long as I let it run and current value is 81 -

get /consumers/May17_test2/offsets/MDMDedupingInitialLoadDemo/0
81
cZxid = 0x40016b868
ctime = Tue May 17 11:07:35 CDT 2016
mZxid = 0x400170212
mtime = Tue May 17 17:16:00 CDT 2016

Can you tell me what log line you searched and didn't find?

MessageBrokerManager.ConsumerGroupHelperUnit is nothing but KafkaNET.Library.Examples.ConsumerGroupHelperUnit

I see that I will hit noMoreMessage = false; statement only when I get ConsumerTimeoutException on messageEnumerator.MoveNext(); call in KafkaNET.Library.Examples.ConsumerGroupHelperUnit.

But that's not whats happening looks like. I am looking into details. If anything else you find, pls share.

Prasad Ballingam · Answer 3 · Thu May 19 2016 07:53:05 GMT+0800 (China Standard Time)

Kafka.Client.Consumers.Library.ConsumerIterator.MakeNext()
{
.
.
bool done = channel.TryTake(out currentDataChunk, consumerTimeoutMs, cancellationToken);
.

In this line, even after reading till last message AND updated offset in ZK correctly, TryTake() returns true and gets one of the already read chunks as current and iteration starts all over again from some random offset that is part this current chunk.

This behavior is intermittent.

Does the ConsumerGroupHelper work for any of you? Those of you using this in production, rewrote the functionality of consumer group helper? What is your uptime and/or experience so far?