Race Condition: Failed One-Off Job with Epsilon Panics

Question

Race Condition: Failed One-Off Job with Epsilon Panics

tooolbox opened this issue 5 years ago · comments

When initing a job, if it has no schedule, it's run right away.

	// TODO: Delete from cache after running.
	if j.Schedule == "" {
		// If schedule is empty, its a one-off job.
		go j.Run(cache)
		return nil
	}

	j.lock.Unlock()
	err = j.InitDelayDuration(true)
	j.lock.Lock()
	if err != nil {
		j.lock.Unlock()
		cache.Delete(j.Id)
		j.lock.Lock()
		return err
	}

You can see that the run starts before InitDelayDuration() is called.

If the job has multiple tries, it could also have an Epsilon to space out those tries. However, the Epsilon is parsed in InitDelayDuration().

If the one-off job immediately fails in that separate goroutine, the Runner will call shouldRetry() which attempts to access the nil Epsilon, and panics.

Matt Mc · Answer 1 · Sat Nov 16 2019 03:09:56 GMT+0800 (China Standard Time)

Since, in actual fact, an Epsilon is really only applicable to jobs with a Schedule, probably the best solution would be to tweak shouldRetry() so it reads like this:

func (j *JobRunner) shouldRetry() bool {
	// Check number of retries left
	if j.currentRetries == 0 {
		return false
	}

	// Check Epsilon
	if j.job.Epsilon != "" && j.job.Schedule != "" {
		if j.job.epsilonDuration.ToDuration() != 0 {
			timeSinceStart := time.Now().Sub(j.job.NextRunAt)
			timeLeftToRetry := j.job.epsilonDuration.ToDuration() - timeSinceStart
			if timeLeftToRetry < 0 {
				return false
			}
		}
	}

	return true
}

i.e. add && j.job.Schedule != ""