sorentwo / oban

πŸ’Ž Robust job processing in Elixir, backed by modern PostgreSQL and SQLite3

Home Page:https://getoban.pro

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Oban worker backoff algorithm

philippe-lammerts-remote opened this issue Β· comments

Hi Oban team,

I have a question about the docs in the customizing backoff section:
https://hexdocs.pm/oban/Oban.Worker.html#module-customizing-backoff

With the default backoff behavior, the 20th attempt will occur around 12 days after the first attempt.

When calling the default backoff time for the 20th attempt, I get the following:

iex> Oban.Worker.backoff(%Oban.Job{attempt: 20})
1120672

Which is ~12 days, so I expect that the 20th attempt is scheduled ~12 days after the first attempt. But when I look at the code, I see the following:

 # Oban.Queue.Executor
  def ack_event(%__MODULE__{ack: true, job: job, state: :failure, worker: worker} = exec) do
    backoff = if worker, do: worker.backoff(job), else: Worker.backoff(job)

    Engine.error_job(exec.conf, job, backoff)

    exec
  end

  # Oban.Engines.Basic
  @impl Engine
  def error_job(%Config{} = conf, %Job{} = job, seconds) when is_integer(seconds) do
    updates = [
      set: [state: "retryable", scheduled_at: seconds_from_now(seconds)],
      push: [errors: Job.format_attempt(job)]
    ]

    Repo.update_all(conf, where(Job, id: ^job.id), updates)

    :ok
  end

But it seems that the 20th attempt should be scheduled 12 days after the 19th attempt since scheduled_at is set according to the current time (utc_now) of the 19th attempt and incremented by the backoff duration of 12 days. Perhaps I overlooked something in the code, but I'm wondering if the documentation is correct. If it is, what did I miss in the code?

The documentation is incorrect. That section originally said "the backoff was clamped to 12 days for the 20th attempt", but it was reworded incorrectly to indicate that the total duration for all attempts is 12 days.

How about this wording?

With the default backoff behavior the 20th attempt will occur around 12 days after the 19th attempt, and a total of 25 days after the first attempt.

Here's a full table of the min/max backoffs per-attempt. It was generated to go in the docs, but the ex_doc formatting makes it obnoxiously large and unreadable.

attempt min backoff max backoff
1 17s 18s
2 19s 20s
3 23s 25s
4 31s 34s
5 47s 51s
6 1m 19s 1m 26s
7 2m 23s 2m 37s
8 4m 31s 4m 58s
9 8m 47s 9m 38s
10 17m 19s 19m 2s
11 34m 23s 37m 49s
12 1h 8m 31s 1h 15m 22s
13 2h 16m 47s 2h 30m 26s
14 4h 33m 18s 5h 0m 37s
15 9h 6m 22s 10h 1m 0s
16 18h 12m 30s 20h 1m 45s
17 1d 12h 24m 47s 1d 16h 3m 15s
18 3d 0h 49m 18s 3d 8h 6m 13s
19 6d 1h 38m 22s 6d 16h 12m 12s
20 12d 3h 16m 31s 13d 8h 24m 9s

Thanks a lot for the quick response! πŸ™πŸ» Yes, that sounds much better! πŸ‘πŸ»