documentcloud / cloud-crowd

Parallel Processing for the Rest of Us

Home Page:https://github.com/documentcloud/cloud-crowd/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ability to delete jobs

wnoronha opened this issue · comments

Would be good to delete a job (well you can do this by Job.find(n).delete

Doing this should also delete the dependent work units (which does not happen)

Just noticed you have the :dependent attr for the work units..

For automatic job deletion, check out the "Cleaning Up" section on this page:

http://wiki.github.com/documentcloud/cloud-crowd/the-job-api

The workflow for this is: User decides to start a scrapper job for domain perl.org. Soon after the jobs/nodes have started processing this request he realizes he wants to scrape pearl.org. Need a clean way to cancel this job.

It's not safe to just delete the job, because a large number of other computers might be right in the middle of processing it. This is something that needs to be handled by your application. I'd recommend having your process method check the status of the model in the database before doing the work (or periodically, while doing it), and aborting if the status of the model is "cancelled".