If I stop the worker and start it later the periodic job runs hundreds of times

Question

If I stop the worker and start it later the periodic job runs hundreds of times

ciokan opened this issue 7 years ago · comments

I have a periodic job that runs every second. If I stop the worker for 5 minutes (usually during development) and then start it again, the job will run 300 times in a row before re-entering it's cycle of 1 per second.

	beego.Debug("Starting worker")
	pool := work.NewWorkerPool(Context{}, 1, "typely", redisPool)
	
	// Add middleware that will be executed for each job
	pool.Middleware((*Context).Log)
	
	pool.PeriodicallyEnqueue("*/1 * * * * *", "create_invoices")
	pool.Job("create_invoices", (*Context).CreateInvoices)
	
	// Start processing jobs
	pool.Start()
	
	// Wait for a signal to quit:
	signalChan := make(chan os.Signal, 1)
	signal.Notify(signalChan, os.Interrupt, os.Kill)
	<-signalChan
	
	// Stop the pool
	pool.Stop()
}

func (c *Context) Log(job *work.Job, next work.NextMiddlewareFunc) error {
	beego.Debug("Starting job: ", job.Name)
	return next()
}

func (c *Context) CreateInvoices(job *work.Job) error {
	return CreateInvoices()
}```

During downtime, if I change the cron to run every hour, when I restart it, it still makes up for the lost seconds running 300 times and only then it enters the new cycle of 1/h.

Is this desired behavior?

Xiao Zhang · Answer 1 · Tue Oct 10 2017 15:06:44 GMT+0800 (China Standard Time)

Hi, I found this "issue", too.

After investigation, I think the root cause is, the "periodicEnqueuer" enqueues jobs to queue "[name_space]:scheduled" ahead of time, so when you start a worker pool after 5 minutes, the "schedulerRequeuer" will push these "outdated" jobs to queue "[name_space]:jobs:[job_name]", then the jobs are processed by workers.

At first, I tried to solve this problem by comparing "now()" with the "scheduled time" by myself, but I failed, because I can not get the "scheduled time", the "Job.enqueuedAt" is the time of the job put into queue "[name_space]:jobs:[job_name]", which is now(), and the time is changed in "redis.go@line 235". Besides, I don't think it is elegant to do such compare.

Then I remember how this problem was solved in K8S:
https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#starting-deadline-seconds
Now I'm trying to add similar feature.

I'll create a PR when I finish, along with other 2 PRs -- #67 and #68 , I think they are bugs.

Xiao Zhang · Answer 2 · Tue Nov 07 2017 19:04:03 GMT+0800 (China Standard Time)

Hi, I created a PR:
#78