technoweenie / coffee-resque

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using pub / sub instead of polling

chetan51 opened this issue · comments

Is there a particular reason coffee-resque does not use node_redis's publish / subscribe API to check for new jobs / results of the jobs instead of using polling?

No one's written it. We still use an ancient Redis without pub/sub, so I have no need yet.

I feel like it still needs to poll anyway in the rare case no resque worker is connected when a job is queued. I'm all for it though, as long as it follows the ruby resque project's "spec." I don't recall if the ruby resque lib supports redis pub/sub, but both libs definitely should.

Oh, ok. My friend is looking into replacing polling with pub/sub, while making sure that the case where no worker is connected when a job is queued is taken care of.

Another advantage of pub/sub is that coffee-resque can subscribe and be notified when a particular job is complete, so that means the main app can pass along a callback with a job enqueue, and have the callback fired with the results of the job when it is complete.

I'm curious if pub/sub is 'correct' for resque. I think push notification of jobs would be great, if the messaging was point to point and durability was built into the messaging system. The concern I would have around a raw redis pub/sub solution is the potential for duplicating work in a cluster. Let's say that I have a class of jobs that are very cpu intensive and for that reason I decide to run resque on multiple host machines in order to scale and support my required throughputs. Wouldn't all the nodes in the cluster, assuming the same redis instance is used for publishing work, receive the same job notifications? The current poll based solution allows me to scale in this fashion.

There is at least one way I can think of where a combination of pub/sub and a redis list(s) could be used to support a hacky point to point model. In this configuration when a job was enqueued a message, 'come and get some new work', is published to a channel which is a means of notification only, ie doesn't contain the job definition. This in turn triggers all subscribers to lpop the job list. The first worker to lpop (thanks to redis atomic ops) wins the job and the other nodes go back to waiting. This would only solve half of the problem though.

I've seen some durability hacks built using other redis primitives but pub/sub in redis is intentionally simple. One way to get around subscription durability from resque's perspective would be to support a combination of catch up and subscribe modes. In this configuration a new resque process(s) would poll (catch up on) queued work until it was forced to wait (no more jobs), at which point it would switch into subscribe mode. Again using the subscription solution/hack described above.

All of this is doable but I'm not sure it is really in resque's scope. It may not take a lot of code to prototype however.