technoweenie / coffee-resque

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Polling for jobs just stops when idled for extended period

perezd opened this issue · comments

I am noticing that after a while of polling for jobs, coffee-resque stops polling entirely, and just lets work pile up. If I pass work through it regularly, it never stops, but if it sits for roughly 15 minutes, just simply polling the redis server, it eventually stops checking in with redis. I've verified this using the MONITOR command from redis-cli.

not sure where to look to fix this :(

I'm sure you already understand the callback contract between with job functions and resque. It is very important, with the current design, to ensure you always callback to resque when a job is complete (success or otherwise). If you don't workers can and will get stuck. If all the workers end up in this state then you definitely will see the polling stop. I'm not saying this is what is happening in your case, just wanted to reiterate this from the readme.

We had another user with the same issue and he reported back that he found a place where an err was leaving his job logic and thus bypassing the callback. Ultimately he fixed the logic and the problem went away. I have a product based on v0.1.2 in production that has been running for over a month with 4 vms running workers and luckily haven't encountered what you are seeing.

To further isolate this I started a set of workers and haven't enqueued any work for 20+ mins and the polling is steady. Without being privy to the code in question I can't give much guidance beyond this. If you are working on something that is open, and if you are interested, I'd be happy to review to further assist you. Just let me know.

This is good advice, I'll double check my code. Thanks!

No worries. I'm going to reopen for tracking sake in the event that you find a bug with the module. If you determine otherwise please close.

Hey, for my case I had uncaught exceptions that prevented the callbacks from coming back. That caused worker starvation. You can try placing a try/catch

try {
var worker = require('coffee-resque').connect().worker(''high', jobs);
worker.start();
}
catch (err) {
worker.end();
worker.start();
}