microsoft / CSharpClient-for-Kafka

.Net implementation of the Apache Kafka Protocol that provides basic functionality through Producer/Consumer classes. The project also offers balanced consumer implementation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ConsumedOffset is handled incorrectly when an OffsetOutOfRange error is received by the fetcher

mhorstma opened this issue · comments

Description

In cases where a consumer cannot keep up with the incoming message rate and the fetcher attempts to begin fetching using an offset that no longer exists on the broker because it has been truncated, Kafka returns an OffsetOutOfRange error. This error is handled in FetcherRunnable.cs around line 120, where the partitionTopicInfo.FetchOffset and ConsumeOffset are updated to the new value computed by ResetConsumerOffsets(). The bug occurs later in the ConsumerIterator class, where the MakeNext() method updates the currentTopicInfo.ConsumeOffset to the currentDataChunk.FetchOffset value (ConsumerIterator.cs, line 256).

In cases where the OffsetOutOfRange error was handled, this will cause the ConsumeOffset to be set forward (correctly) by FetcherRunnable, and then reset back to the old value when MakeNext() is next called. The result is seen in log lines that look like this:

DEBUG Kafka.Client.Consumers.PartitionTopicInfo - reset fetch offset of xxxx:y: fetched offset = 111101027: consumed offset = 110956806 to 111101027
DEBUG Kafka.Client.Consumers.PartitionTopicInfo - reset consume offset of xxxx:y: fetched offset = 111101027: consumed offset = 111101027 to 111101027
INFO  Kafka.Client.Consumers.FetcherRunnable - xxxx:y: fetched offset = 111101027: consumed offset = 111101027 marked as done.
DEBUG Kafka.Client.Messages.BufferedMessageSet - MakeNext() in deepIterator: innerDone = True
DEBUG Kafka.Client.Messages.BufferedMessageSet - Message is uncompressed. Valid byte count = 8361220
DEBUG Kafka.Client.Consumers.PartitionTopicInfo - reset consume offset of xxxx:y: fetched offset = 111101027: consumed offset = 110956807 to 110956807

This causes the ConsumerIterator to start timing out whenever calling channel.TryTake() and the consumer process stalls, since it is then unable to make forward progress.

Proposed "Fix"

ConsumerIterator needs to be aware when FetcherRunnable updates the parititonTopicInfo ConsumeOffset in error situations like the OffsetOutOfRange error. If the value is updated, ConsumerIterator should not overwrite the updated value and should reset itself to the new offset appropriately so it can continue to make forward progress.

Note: This is a very similar issue to Issue #41