ajvb / kala

Modern Job Scheduler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Race Condition: Failed One-Off Job with Epsilon Panics

tooolbox opened this issue · comments

When initing a job, if it has no schedule, it's run right away.

	// TODO: Delete from cache after running.
	if j.Schedule == "" {
		// If schedule is empty, its a one-off job.
		go j.Run(cache)
		return nil
	}

	j.lock.Unlock()
	err = j.InitDelayDuration(true)
	j.lock.Lock()
	if err != nil {
		j.lock.Unlock()
		cache.Delete(j.Id)
		j.lock.Lock()
		return err
	}

You can see that the run starts before InitDelayDuration() is called.

If the job has multiple tries, it could also have an Epsilon to space out those tries. However, the Epsilon is parsed in InitDelayDuration().

If the one-off job immediately fails in that separate goroutine, the Runner will call shouldRetry() which attempts to access the nil Epsilon, and panics.

Since, in actual fact, an Epsilon is really only applicable to jobs with a Schedule, probably the best solution would be to tweak shouldRetry() so it reads like this:

func (j *JobRunner) shouldRetry() bool {
	// Check number of retries left
	if j.currentRetries == 0 {
		return false
	}

	// Check Epsilon
	if j.job.Epsilon != "" && j.job.Schedule != "" {
		if j.job.epsilonDuration.ToDuration() != 0 {
			timeSinceStart := time.Now().Sub(j.job.NextRunAt)
			timeLeftToRetry := j.job.epsilonDuration.ToDuration() - timeSinceStart
			if timeLeftToRetry < 0 {
				return false
			}
		}
	}

	return true
}

i.e. add && j.job.Schedule != ""