Incorrect count for attempts upon ultimate failure?
christopherraa opened this issue · comments
- Minion version: 10.22
- Perl version: v5.30.0
- Operating system: Ubuntu 21.04
Steps to reproduce the behavior
- Enqueue a job that should fail after a set number of attempts
- Wait for it to progress to a failed state, meaning all attempts should be exhaused
- Check if
attempts
has reached0
The following code demonstrates the issue:
#!/usr/bin/env perl
use Mojo::Base -strict;
use Mojo::Pg;
use Mojolicious::Lite;
use Test::Mojo;
use Test2::V0;
my $t = Test::Mojo->new;
helper pg => sub { state $pg = Mojo::Pg->new('postgresql:///') };
plugin Minion => { Pg => app->pg };
my $attempts = 0;
app->minion->add_task(dies => sub { $attempts++; shift->fail(['ouch']) });
app->minion->backoff(sub { 0 });
my $id = app->minion->enqueue(dies => [] => { attempts => 10});
my $task = app->pg->db->select('minion_jobs', '*', { id => $id })->expand->hash;
my $expected = { attempts => 10, retries => 0, result => undef, state => 'inactive' };
like $task, $expected, 'task should be inactive and have 10 remaining attempts';
app->minion->perform_jobs_in_foreground();
$task = app->pg->db->select('minion_jobs', '*', { id => $id })->expand->hash;
$expected = { attempts => 0, retries => 9, result => ['ouch'], state => 'failed' };
like $task, $expected, 'task should have zero attempts remaining and be retried 9 times';
is $attempts, 10, 'ten attempts recorded';
done_testing();
Expected behavior
I would expect attempts
on the job to be set to 0
when all retries has been exhaused, and retries
be set to <attempts> - 1
as that would be the actual number of retries. In other words: total number of attempts = the initial attempt + nine retries.
Expected output from the above script:
# Seeded srand with seed '20210809' from local date.
ok 1 - task should be inactive and have 10 remaining attempts
ok 2 - task should have zero attempts remaining and be retried 9 times
ok 3 - ten attempts recorded
1..3
Actual behavior
The task is marked with the correct number of performed retries
, but remaining attempts
stops at 1
instead of 0
.
Actual output from the script:
# Seeded srand with seed '20210809' from local date.
ok 1 - task should be inactive and have 10 remaining attempts
not ok 2 - task should have zero attempts remaining and be retried 9 times
# Failed test 'task should have zero attempts remaining and be retried 9 times'
# at ./retries-test.t line 28.
# +------------+-----+----+-------+
# | PATH | GOT | OP | CHECK |
# +------------+-----+----+-------+
# | {attempts} | 1 | eq | 0 |
# +------------+-----+----+-------+
ok 3 - ten attempts recorded
1..3
# Looks like you failed 1 test of 3.
Given that attempts
steadily decrease as the job is retried I would assume that attempts
hit zero when no remaining attempts were left.
To me it seems like the meaning of attempts
kind of changes during the task lifecycle. By that I mean that attempts
starts out as "how many times the jobs is to be attempted before transitioning to a failed state" but after the task has transitioned to failed
the meaning is suddenly "indication that the task had one initial attempt" so that if you summarize attempts
and retries
you get the same number as initially passed in the attempts
-attribute to ->enqueue()
.
Thinking about this further it seems like attempts
has three distinct meanings:
- before performing task: "how many times the jobs is to be attempted before transitioning to a failed state"
- during task execution: "remaining attempts before giving up"
- after transitioning to
failed
: "indication that the task had one initial attempt"
Happy to consider a PR.
I'm a bit usure what would be the "correct" behaviour and even a solution that could be accepted. The way I see this issue it is hard to solve it witout introducing breaking changes in some way.
The simplest solution that I can think of would be to let attempts
reach zero when the last attempt is done and make sure that the documentation clearly say that attempts
denotes "the number of remaining attempts".
Another solution would be to make a distinction between "the number of attempts each job is configured with" and "the number of remaining attempts". This can be achieved by adding a new property of the job that hold the number of attempts that was initially configured for the job, which would not change during job execution / retries. Naming is hard but names such as total_attempts
, attempt_limit
or something of the sort could be considered. attempts
would then still signify "the number of remaining attempts" and would at time of enqueue()
be set to the same value as attempt_limit
, which whould default to 1
to keep things similar to how it is today. In any case I think I would like to see attempts
hit zero once all attempts were exhausted. The reason I'd like to see this separation of fields is that with this solution none of them would change what they signify during the job lifetime.
Do you have any thoughts or preferences here?