documentcloud / cloud-crowd

Parallel Processing for the Rest of Us

Home Page:https://github.com/documentcloud/cloud-crowd/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Windows Support.

opened this issue · comments

Right now, the main thing keeping CloudCrowd from running on windows is its use of 'fork', via the Daemons gem, to run daemons. We should investigate the Win32Utils, at http://win32utils.rubyforge.org/ (although that name doesn't sound promising for 64-bit systems), as well as alternatives to fork, which may mean the removal of the Daemons gem as a dependency.

Ideas or alternatives? War-stories of successful windows daemons?

I tried installing the gem using RubyInstaller (both 1.8.6 & 1.9) w/ the dev kit on an amazon windows 2003 instance and win32-service crapped out during compile. Same for XP and Vista VM's (using VMWare fusion).

Really no good information out there for making ruby services past ruby version 1.8.4. Will keep investigating.

Wow. That's pretty bad. What exactly is your need for running worker nodes on Windows? It seems like a pretty unusual idea. Custom windows-only software you want to integrate with an action?

In good news, the current code on master (what will become the 0.2 release), does away with the Daemons gem and long-lived daemons in general, so that's one obstacle out of the way for windows support.

we have scientific data in a proprietary format that is only accessible through the vendor's windows only DLL. We need to convert these to XML and were planning on building something very close to cloud-crowd.

I saw the code that turned nodes into sinatra apps and push work items instead of workers poling the server. Will keep an eye on that development, but it should be fairly simple to account for Windows file paths in the rest of the code

Unfortunately, Windows support is getting farther away. With the new node architecture, forking has become pretty critical to the workers. This afternoon, 'max_load' and 'min_free_memory' were added to the configuration options. Both depend on UNIX utilities to get the system's load average and available memory. If there are Windows alternatives that can provide us with all these -- that's wonderful, but if not, it might be better to close this ticket, just say UNIX only, and move on.

OK, I understand the reasons, but it would have been nice.

Took a stab at this today, and with a combination of using the sys-cpu gem and making a system call to systeminfo, I can get the needed CPU and load information for testing whether a node is busy. The last roadblock was fork() which I solved by threading workers instead of forking, but this caused other problems to appear since there are several places in the code where you change the working directory, as safe assumption when the process is forked, but not with threads. Also the signal traps on the worker process interfered with the signal trap of the node server.

Long story short, it works on windows, but only using threads, which is less than optimal. The other choice is to limit a node to one worker when the RUBY_PLATFORM is win32.

Thoughts?

That's great stuff. It sounds like we're close then. One worker per node won't cut the mustard. We need to either fork or start a subprocess by shelling out. Forking is preferable. Why not try the win32-process library again? It seems like that's what everyone is using.

argh, I didn't see that they had pre-built gems before now! Well anyway, yeah I think this is a better way to go than threading. sys-cpu for load information, systeminfo for mem usage and win32-process for forking workers. I'll see if I can have a minimal set of changes tonight.