technoweenie / coffee-resque

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Jobs with non blocking behaviors

steelThread opened this issue · comments

First off cool framework. I started using it recently as part of an large data slurp and index solution. In its current state I feel coffee-resque is too opinionated wrt job completeness during the Worker's perform method. In a node based batching environment, where non blocking is the norm, there is a pretty good chance that a job will preform some sort of non blocking operation and therefore not really be complete until a later time. The currently implementation will assume a job was successful as soon at the call to the callback is returned.

Given the async nature of the environment I think you are going to need to put a callback protocol in place in order to truly determine the outcome of a job. There is a fork of this project that has already included a similar solution however it also addresses some other concerns that I'm not really interested in at this point. In any case I added a two line change to your master branch to get the desired behavior, which is working for me. Here's a gist showing the solution.

Why not send a pull request?

I would but I feel that there are a couple higher level design ideas that should be hashed out. First off what to do with the response of the callback. Probably want to relay this on the 'success' event and simply pass it to the succeed method . The other issue that I'm playing with is the potential to stop the polling loop until the job is complete. I'm finding that if I kick off a lot of restler requests a large % of the underlying connections are getting reset. The whole try..catch..finally doesn't really fit with a callback model.

Let me work thorough these issues before I send a pull req. If you have other ideas please share.

Thanks for the latest gist. I guess I'd prefer looking at a diff, which is why I asked for the Pull Req.

You bring up a great point about callbacks. I think I meant to limit it to one job per worker originally. The current implementation held up against a ruby script throwing around 15M jobs at Resque from a DB dump though. coffee-resque handled the jobs like a boss :)

I really like the gist.

It has been a boss for me too. Really nice work. I think the worker callback is totally necessary and really the only way resque can track workers. Also a polling model that assumes a single worker is performing at most a single job at any given moment brings some sanity to the processing. Without it a single worker would be performing concurrent jobs. This has a couple of potentially bad side effects. First users couldn't control concurrency as a worker instance could potentially spool off all the messages (an over eager consumer). Secondly users of resque couldn't model worker logic that required per job state. With the new polling model concurrency is controlled/throttled with more or less worker instances running and there are no restrictions on how workers are implemented.

In any case I'll refork and put together a commit stream that is easy to follow. Pull req to follow shortly.

This issue has been resolved with the recent merge of my pull req. Nothing like being added as a collab on a project and your first course of action is to close your own issue.