BroadwayKafka 0.3.1 is skipping the current offset
alexandrexaviersm opened this issue Β· comments
Hi folks π
BroadwayKafka version 0.3.1 (specifically this PR #72) introduced an undesirable behavior where the consumer doesn't go back to consuming the offsets from where it left off, now it's always taking the lastest offset. I think this is a Major issue because you can end up losing a lot of messages if your consumer is restarted.
For example, suppose you have a constant flow of messages, let's say that a topic receives about 10 messages per second and you decide to stop the server to make a deployment. The desired behavior is: When it becomes active again, the consumer starts consuming since the last committed offset, so it would continue the flow normally without losing any message. But with the behavior introduced in 0.3.1, the current_offset (last committed offset) is ignored and instead, we are only reading the new messages that arrive after the consumer is active again.
In this example, If the consumer takes 10 seconds to come back up again after the deployment, you will have missed 100 messages.
Maybe there was a misinterpretation of the configuration offset_reset_policy
and now we're using it in cases we shouldn't.
:offset_reset_policy - Optional. Defines the offset to be used when there's no initial offset in Kafka or if the current offset has expired.
Possible values are :earliest or :latest. Default is :latest.
As shown in the docs, I believe we should use this policy only when the offset is :undefined
(new consumers) or the current_offset is already expired. If your application already knows the offset it should use and it is still active, then I think it's wrong using the :latest
or :erliest
offset option. I think this undesirable behavior is also related to this issue #74
I created the PR #75 and I think it should fix the problem that we were trying to fix in the issue #71, without introducing the side effects that I described here and also in the one described in the issue #74.
How to reproduce:
After initializing Kafka, create a topic
kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --topic test --replication-factor 1
Starting a new project
mix new kafka_consumer --sup
defp deps do
[
{:broadway, "~> 1.0"},
{:broadway_kafka, "~> 0.3"}
]
end
Define a basic pipeline configuration
defmodule MyBroadway do
use Broadway
alias Broadway.Message
def start_link(_opts) do
Broadway.start_link(__MODULE__,
name: __MODULE__,
producer: [
module:
{BroadwayKafka.Producer,
[
hosts: [localhost: 9092],
group_id: "group_1",
topics: ["test"]
]},
concurrency: 1
],
processors: [
default: [
concurrency: 10
]
]
)
end
@impl true
def handle_message(_, message, _) do
message
|> Message.update_data(fn data ->
IO.inspect(data, label: "Got messsage")
{data, String.to_integer(data) * 2}
end)
end
end
Add it as a child in a supervision tree
children = [MyBroadway]
Supervisor.start_link(children, strategy: :one_for_one)
You can now test the pipeline by entering an iex session:
iex -S mix
Open another terminal window and send messages to Kafka
kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
>1
>2
>3
You should see this output
iex> Got messsage: "1"
iex> Got messsage: "2"
iex> Got messsage: "3"
Now hit Ctrl-C twice to stop the broadway consumer and send more messages to kafka:
kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
>4
>5
>6
Start your Elixir application again:
iex -S mix
You can wait for a while, but new messages that were sent while the consumer was offline will not be consumed.
Try to send a new message:
kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
7
You should see this output
iex> Got messsage: "7"
This means that offsets 3, 4, and 5 were skipped
The desired behavior for a kafka consumer is that it doesn't skip any available messages, so if the last ack was offset 2, it would have to continue from offset 3 when you start it again, and it would have to consume messages received while it was offline.