dashbitco / broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

Home Page:https://elixir-broadway.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to checkout db connection for Broadway

michaelst opened this issue · comments

As far as I can tell I have followed the docs correctly to checkout the db connection and allow Broadway to use it. However I keep getting a DBConnection.OwnershipError

13:39:30.453 [basic_console_logger] [error] ** (DBConnection.OwnershipError) cannot find ownership process for #PID<0.606.0>.

Here is some of the code for that test.

{:ok, pid} = PubSubBackup.Consumer.start_link(name: __MODULE__)
Sandbox.allow(Repo, self(), pid)

ref = Broadway.test_batch(pid, [message1, message2, message3])
assert_receive {:ack, ^ref, [_, _, _] = data, []}, 2000

When I look up that process in the process list it shows this

[
     registered_name: PubSubBackup.ConsumerTest.Broadway.BatchProcessor_default_0,
     current_function: {:gen_server, :loop, 7},
     initial_call: {:proc_lib, :init_p, 5},
     status: :waiting,
     message_queue_len: 0,
     links: [#PID<0.605.0>],
     dictionary: [
       "$ancestors": [PubSubBackup.ConsumerTest.Broadway.BatchProcessorSupervisor_default,
        PubSubBackup.ConsumerTest.Broadway.BatcherSupervisor_default,
        PubSubBackup.ConsumerTest.Broadway.BatchersSupervisor,
        PubSubBackup.ConsumerTest.Broadway.Supervisor,
        PubSubBackup.ConsumerTest, #PID<0.572.0>],
       "$initial_call": {GenStage, :init, 1},
       rand_seed: {%{
          bits: 58,
          jump: #Function<3.47293030/1 in :rand."-fun.exsplus_jump/1-">,
          next: #Function<0.47293030/1 in :rand."-fun.exsss_next/1-">,
          type: :exsss,
          uniform: #Function<1.47293030/1 in :rand."-fun.exsss_uniform/1-">,
          uniform_n: #Function<2.47293030/2 in :rand."-fun.exsss_uniform/2-">
        }, [235498787232554556 | 20651838642852265]}
     ],
     trap_exit: true,
     error_handler: :error_handler,
     priority: :normal,
     group_leader: #PID<0.64.0>,
     total_heap_size: 986,
     heap_size: 376,
     stack_size: 12,
     reductions: 299,
     garbage_collection: [
       max_heap_size: %{error_logger: true, kill: true, size: 0},
       min_bin_vheap_size: 46422,
       min_heap_size: 233,
       fullsweep_after: 65535,
       minor_gcs: 1
     ],
     suspending: []
   ]

That won’t work because Broadway returns the root of its supervision tree, not the processes doing the actual work. You will have to mark your Broadway tests as sync for now.

What would be required to support async Broadway tests that make use of SQL Sandbox? Maybe one way to do it would be to publicly expose Topology.process_name(s) so that the tests could allow them all.

@stefanchrobot I would investigate supporting the $callers API.

What I ended up implementing was passing a function into the context of start_link so we can call allow in the pipeline.

    setup context do
      self = self()

      allow = fn pid ->
        Sandbox.allow(Genesis.Repo, self, pid)
      end

      {:ok, _pid} = Consumer.start_link(name: context.test, context: %{allow: allow})

      :ok
    end

@josevalim I'd like to take stab at this. Is $callers API something that has some sort of spec or should I just look at Task as a reference implementation? Should the change happen somewhere around Topology or should I have a look at applying this to GenStage?

It is definitely a Broadway thing. We should pass the caller as part of the message metadata. Then inside handle_message and handle_batch we look at this metadata and set the caller in the process dictionary accordingly and then revert it.

Then in all test messages we include the relevant caller metadata.

@josevalim

We should pass the caller as part of the message metadata.

Should the caller metadata be included in all messages or only those pushed via test_message and test_batch?

@stefanchrobot only on test_message/test_batch IMO.

I have struggled to support this in Broadway out of the box but I believe I have found a reasonable way to enable this with the tools available today. When you send a test message, you can include additional metadata:

Broadway.test_message(MyPipeline, message, metadata: %{caller: self()})

Now you can use the telemetry events, that run on each process, to customize the ownership:

# In your test/test_helper.exs
defmodule BroadwayEctoSandbox do
  def attach(Repo) do
    events = [
      [:broadway, :processor, :start],
      [:broadway, :batch_processor, :start],
    ]

    :telemetry.attach_many({__MODULE__, repo}, events, &handle_event/4, %{repo: repo})
  end

  def handle_event(_event_name, _event_measurement, %{messages: messages}, %{repo: repo}) do
    with [%Broadway.Message{metadata: %{caller: caller}} | _] <- messages do
      Ecto.Adapters.SQL.Sandbox.allow(repo, caller, self())
    end

    :ok
  end
end

BroadwayEctoSandbox.attach(MyRepo)