documentcloud / cloud-crowd

Parallel Processing for the Rest of Us

Home Page:https://github.com/documentcloud/cloud-crowd/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JRuby Support

zaach opened this issue · comments

We're looking to run cloud-crowd on JRuby. Right now Thin seems to be blocking this, and perhaps other parts of the code base, so I'm wondering how feasible it would be to see a JRuby version?

I saw your series of JRuby patches -- looks like great stuff. I'd like to try and find a way of integrating it into the project without replacing thin entirely on the mainline -- perhaps an 'application_server' option in config.yml, where you can specify 'mongrel', or 'thin'.

Let me know how confident you are that the JRuby branch is fully functional, and I can pull it back in.

Thanks, I was going to message you after verifying it a bit more. There are still a few tweaks that probably need to be made, but that sounds like a good plan.

There seems to be an issue causing some workers to fail. I'm thinking it might be because of a race condition (ughh) between worker threads. When running two or more workers, ever so often one fails with an error like this:

Worker #238: failed unit #246 (word_count/processing) in 1.804 seconds
No such file or directory - No such file or directory - /private/var/folders/-W/-WYY+JiiFwmqpzN977l-QE+++TI/-Tmp-/cloud_crowd_tmp/word_count/job_48/unit_245
/Users/zach/.rvm/gems/jruby/1.3.1/gems/cloud-crowd-0.2.6/lib/cloud_crowd/action.rb:113:in chdir'/Users/zach/.rvm/gems/jruby/1.3.1/gems/cloud-crowd-0.2.6/lib/cloud_crowd/action.rb:113:indownload_input'/Users/zach/.rvm/gems/jruby/1.3.1/gems/cloud-crowd-0.2.6/lib/cloud_crowd/action.rb:34:in initialize'/Users/zach/.rvm/gems/jruby/1.3.1/gems/cloud-crowd-0.2.6/lib/cloud_crowd/worker.rb:76:inrun_work_unit'/Users/zach/.rvm/gems/jruby/1.3.1/gems/cloud-crowd-0.2.6/lib/cloud_crowd/worker.rb:100:in run'/Users/zach/.rvm/gems/jruby/1.3.1/gems/cloud-crowd-0.2.6/lib/cloud_crowd/node.rb:60:inPOST /work'/Users/zach/.rvm/gems/jruby/1.3.1/gems/cloud-crowd-0.2.6/lib/cloud_crowd/node.rb:60:in `initialize'

Oddly, it seems the worker is trying to access the wrong unit folder.

Erg, just had a look. Dir.chdir is of course not thread safe.

Ughh is right -- however, the chdir shouldn't need to be threadsafe. Each worker is running in a separate forked process. There are no threads at that point that could be letting actions step on each others toes.

Hang on a sec -- did you replace the forking with threads in your JRuby branch? That would certainly break the "each work unit runs in a directory of its own" feature. Or did you see this happen on the master branch?

If it's just with threads on jruby, then we would need to find a way to support the "each work unit runs in a scratch directory" feature. It's what allows us to automatically clean up all the scratch files when the work unit finishes, and what allows you not to have to worry about conflicting file paths while writing an action.

Interestingly enough, it turns out that there is a long thread from about a year ago on the topic of adding a Thread.current.chdir method to JRuby. Headius is pretty enthusiastic about it, but I don't think it ever made it into the project. Much more of an urgent issue in JRuby because the costs of running a separate process are relative huge.

http://www.ruby-forum.com/topic/165079

The thread mentions that if you run multiple JRuby interpreters within a single JVM, each of them has its own cwd. Perhaps this can help you -- having each worker run in its own JRuby instance as an analogue to the MRI forking...

Ah yes, I stumbled across that as well. For our purposes I simply commented out the chdir lines, as our actions didn't need to execute commands relative to the directory (the directory is still cleaned up correctly afterward.) Multiple JRuby interpreters sounds like a better general approach though. I'm looking into it.