dashbitco / broadway_kafka

A Broadway connector for Kafka

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Drain After Revoke Error

amacciola opened this issue · comments

When stopping and restarting pipelines i periodically am getting these errors.

Along with my pipeline being stuck in a rebalancing loop before it recovers after a while. Any insight into why after stopping a pipeline i am seeing these errors ? Thanks

19:37:20.225 [error] GenServer #PID<0.17856.0> terminating
** (stop) exited in: GenServer.call(#PID<0.17844.0>, :drain_after_revoke, :infinity)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir) lib/gen_server.ex:989: GenServer.call/3
    (broadway_kafka) lib/producer.ex:415: BroadwayKafka.Producer.assignments_revoked/1
    (brod) /app/deps/brod/src/brod_group_coordinator.erl:477: :brod_group_coordinator.stabilize/3
    (brod) /app/deps/brod/src/brod_group_coordinator.erl:391: :brod_group_coordinator.handle_info/2
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

Hi @amacciola, can you please describe what you mean by "stopping a pipeline"? Please try to provide clear steps: are you stopping Kafka? With each command? Or are you stopping the Elixir code? What are the other instructions around the cluster? Thank you.

@josevalim sorry for not including more details.

I am starting the Pipelines under a DynamicSupervisor and then i am stopping the pipeline by sending a terminate signal.
DynamicSupervisor.terminate_child(__MODULE__, child_pid) the child_pid being the pid of the Pipeline itself.

And this happens when i am running it locally and also on our k8s cluster where this specific applications have 3 pods running with a Pipeline running on each pod connected to the same ConsumerGroup

Can you please try this patch?

diff --git a/lib/producer.ex b/lib/producer.ex
index 98f3ee4..bcfc5dc 100644
--- a/lib/producer.ex
+++ b/lib/producer.ex
@@ -412,7 +412,12 @@ defmodule BroadwayKafka.Producer do
 
   @impl :brod_group_member
   def assignments_revoked(producer_pid) do
-    GenStage.call(producer_pid, :drain_after_revoke, :infinity)
+    # If the producer_pid is no longer alive, it means the revoke
+    # is happening due to a shutdown, so ignore it.
+    if Process.alive?(producer_pid) do
+      GenStage.call(producer_pid, :drain_after_revoke, :infinity)
+    end
+
     :ok
   end
 

If it works, please send a PR!

@josevalim will do. Ill test it out shortly. Thanks

@josevalim tested it out and the errors do not appear anymore.

#44
PR for fix

Reopening because last PR did not completely fix issue and can cause still major bug. Will submit new PR

#45
new PR