Race Condition: Failed One-Off Job with Epsilon Panics
tooolbox opened this issue · comments
Matt Mc commented
When initing a job, if it has no schedule, it's run right away.
// TODO: Delete from cache after running.
if j.Schedule == "" {
// If schedule is empty, its a one-off job.
go j.Run(cache)
return nil
}
j.lock.Unlock()
err = j.InitDelayDuration(true)
j.lock.Lock()
if err != nil {
j.lock.Unlock()
cache.Delete(j.Id)
j.lock.Lock()
return err
}
You can see that the run starts before InitDelayDuration()
is called.
If the job has multiple tries, it could also have an Epsilon to space out those tries. However, the Epsilon is parsed in InitDelayDuration()
.
If the one-off job immediately fails in that separate goroutine, the Runner will call shouldRetry()
which attempts to access the nil Epsilon, and panics.
Matt Mc commented
Since, in actual fact, an Epsilon is really only applicable to jobs with a Schedule, probably the best solution would be to tweak shouldRetry()
so it reads like this:
func (j *JobRunner) shouldRetry() bool {
// Check number of retries left
if j.currentRetries == 0 {
return false
}
// Check Epsilon
if j.job.Epsilon != "" && j.job.Schedule != "" {
if j.job.epsilonDuration.ToDuration() != 0 {
timeSinceStart := time.Now().Sub(j.job.NextRunAt)
timeLeftToRetry := j.job.epsilonDuration.ToDuration() - timeSinceStart
if timeLeftToRetry < 0 {
return false
}
}
}
return true
}
i.e. add && j.job.Schedule != ""