dashbitco / broadway_dashboard

Keep track of your Broadway pipelines from Phoenix LiveDashboard

Home Page:https://elixir-broadway.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[error] Handler {BroadwayDashboard.Telemetry, #PID<0.xxx.0>} has failed and has been detached.

leductam opened this issue · comments

commented

I tried integrate broadway_dashboard to live LiveDashboard but got the error:

[error] Handler {BroadwayDashboard.Telemetry, #PID<0.xxx.0>} has failed and has been detached.
Reason={:badkey, :time, %{system_time: 16274584813510685}}
Stacktrace=[
  {BroadwayDashboard.Telemetry, :handle_event, 4,
   [file: 'lib/broadway_dashboard/telemetry.ex', line: 47]},
  {:telemetry, :"-execute/3-fun-0-", 4, ...

My deps

defp deps do
    [
      {:phoenix, github: "phoenixframework/phoenix", override: true},
      {:phoenix_html, github: "phoenixframework/phoenix_html", override: true},
      {:phoenix_live_reload, "~> 1.2", only: :dev},
      {:phoenix_live_view, github: "phoenixframework/phoenix_live_view", override: true},
      {:floki, ">= 0.30.0", only: :test},
      {:phoenix_live_dashboard, github: "phoenixframework/phoenix_live_dashboard", override: true},
      {:esbuild, "~> 0.1", runtime: Mix.env() == :dev},
      {:swoosh, "~> 1.3"},
      {:telemetry_metrics, "~> 0.4"},
      {:telemetry_poller, "~> 0.4"},
      {:jason, "~> 1.0"},
      {:plug_cowboy, "~> 2.0"},
      {:broadway, github: "dashbitco/broadway", override: true},
      {:broadway_rabbitmq, github: "dashbitco/broadway_rabbitmq", override: true},
      {:broadway_dashboard, github: "dashbitco/broadway_dashboard", override: true}
    ]
  end

Please help to clarify this issue.

You are using a too recent broadway version. You can use the same version as in the lockfile of this project or wait until we officially release it.

Those are the relevant CHANGELOG notes we need to update:

  * Rename :events to :messages in batcher telemetry event
  * Rename :time to :system_time in telemetry event
  * Rename `[:broadway, :consumer, *]` to `[:broadway, :batch_processor, *]` in telemetry event

We may have already applied the first.

commented

As your advice, adjusted some measurements.xyz at BroadwayDashboard.Telemetry module.

On the dashboard the Throughput is updated, but I saw that the Pipeline was not updated the % of proc_X.

This problem was fixed in master. I'm trying to figure out another bug in the metrics that is making the pipeline to not be updated after a few events.

commented

I got this error:

[error] Handler {BroadwayDashboard.Telemetry, #PID<0.4100.0>} has failed and has been detached. Class=:error
Reason=:badarith
Stacktrace=[
  {BroadwayDashboard.Telemetry, :calc_workload, 3,
   [file: 'lib/broadway_dashboard/telemetry.ex', line: 139]},
  {BroadwayDashboard.Telemetry, :handle_event, 4,
   [file: 'lib/broadway_dashboard/telemetry.ex', line: 75]},
  {:telemetry, :"-execute/3-fun-0-", 4,
   [
     file: '/deps/telemetry/src/telemetry.erl',
     line: 135
   ]},
  {:lists, :foreach, 2, [file: 'lists.erl', line: 1342]},
  {:telemetry, :span, 3,
   [
     file: '/deps/telemetry/src/telemetry.erl',
     line: 274
   ]},
  {GenStage, :consumer_dispatch, 6, [file: 'lib/gen_stage.ex', line: 2462]},
  {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 695]},
  {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 771]},
  {:proc_lib, :wake_up, 3, [file: 'proc_lib.erl', line: 236]}
]

commented

@philss I think that we're facing with the problem in this function:
defp calc_workload(start_time, last_end_time, duration) do idle_time = start_time - last_end_time round(duration / (idle_time + duration) * 100) end

Those variables start_time, last_end_time and duration are got/calculated from values retrieved by calling :erlang.monotonic_time() which return Erlang monotonic time.
There are the statement about Erlang monotonic time at https://learnyousomeerlang.com/time :

Erlang monotonic time. This is Erlang's view of the OS monotonic time if available, or the VM's own monotonic version of the system time if not available. This is the adjusted clock that is used for events, timers, and so on. Its stability makes it ideal to count a time interval. Do note that this time is monotonic, but not strictly monotonic, meaning that the clock can't go backwards, but it can return the same value many times!

The Erlang monotonic time can return the same value many times. Then idle_time and duration might coincident are zero at unspecified point in time, and idle_time + duration might zero too. As I shown that I still got the errors Class=:error Reason=:badarith due to division by zero.

I tried with temporary fix and it is worked:
efp calc_workload(start_time, last_end_time, duration) do if duration > 0 do idle_time = start_time - last_end_time round(duration / (idle_time + duration) * 100) else 0 end end

Please take a look.
Thanks.

@leductam thank you for the details.

I think in the case we are using :erlang.monotonic_time() it's nearly impossible to return the same value twice because it will always do some type of work between the two events (start and stop). I believe the problem was fixed. But can you test with the current master version to see if the problem can occur? Thanks!

commented

@philss sure, I performed some tests after your committed new code to master and still got the errors due to division by zero.

Following https://learnyousomeerlang.com/time:

As mentioned earlier, the Erlang monotonic time is not strictly monotonic: it will possibly return the same number twice if it's called at the same time on two different cores, for example.

@leductam oh no :/
do you mind to send a PR with your suggestion/fix?