Parallel workflow extension for Rake
- Author: Masahiro Tanaka
(README in Japanese), (GitHub Repository)
- Parallelize all tasks; no need to modify Rakefile, no need to use
multitask
. - Tasks are executed in the given number of worker threads.
- Remote exuecution using SSH.
- Pwrake is an extension to Rake, not patch to Rake: Rake and Pwrake coexist.
- High parallel I/O performance using Gfarm file system.
Download source tgz/zip and expand, cd to subdir and install:
$ ruby setup.rb
Or, gem install:
$ gem install pwrake
$ pwrake -j 4
$ pwrake -j
-
Share your directory among remote hosts via distributed file system such as NFS, Gfarm.
-
Allow passphrase-less access via SSH in either way:
- Add passphrase-less key generated by
ssh-keygen
. (Be careful) - Add passphrase using
ssh-add
.
- Add passphrase-less key generated by
-
Make
hosts
file in which remote host names and the number of cores are listed:$ cat hosts host1 2 host2 2
-
Run
pwrake
with an option--hostfile
or-F
:$ pwrake --hostfile=hosts
-F, --hostfile FILE [Pw] Read hostnames from FILE
-j, --jobs [N] [Pw] Number of threads at localhost (default: # of processors)
-L, --logfile [FILE] [Pw] Write log to FILE
--ssh-opt, --ssh-option OPTION
[Pw] Option passed to SSH
--filesystem FILESYSTEM [Pw] Specify FILESYSTEM (nfs|gfarm)
--gfarm [Pw] FILESYSTEM=gfarm
-A, --disable-affinity [Pw] Turn OFF affinity (AFFINITY=off)
-S, --disable-steal [Pw] Turn OFF task steal
-d, --debug [Pw] Output Debug messages
--pwrake-conf [FILE] [Pw] Pwrake configuation file in YAML
--show-conf, --show-config [Pw] Show Pwrake configuration options
--report LOG [Pw] Report profile HTML from LOG and exit.
-
If
pwrake_conf.yaml
exists at current directory, Pwrake reads options from it. -
Example (in YAML form):
HOSTFILE : hosts LOGFILE : true TASKLOG : true PROFILE : true GNU_TIME : true PLOT_PARALLELISM : true DISABLE_AFFINITY: true DISABLE_STEAL: true FAILED_TARGET : delete PASS_ENV : - ENV1 - ENV2
-
Option list:
HOSTFILE, HOSTS default=false LOGFILE, LOG default=none, string=filename, true="Pwrake%Y%m%d-%H%M%S_%$.log" TASKLOG default=none, string=filename, true="Pwrake%Y%m%d-%H%M%S_%$.task" PROFILE default=none, string=filename, true="Pwrake%Y%m%d-%H%M%S_%$.csv" WORK_DIR default=$PWD FILESYSTEM default=nil (autodetect) SSH_OPTION (String) SSH option PASS_ENV (Array) Environment variables passed to SSH GNU_TIME If true, obtains PROFILEs using GNU time PLOT_PARALLELISM If true, plot parallelism using GNUPLOT FAILED_TARGET ( rename(default) | delete | leave ) failed files QUEUE_PRIORITY RANK(default), FIFO, LIFO, LIHR NOACTION_QUEUE_PRIORITY FIFO(default), LIFO, RAND NUM_NOACTION_THREADS default=4 when gfarm, else 1 THREAD_CREATE_INTERVAL default=0.01 (sec) HALT_QUEUE_WHILE_SEARCH true|false GRAPH_PARTITION true|false
for Gfarm system:
DISABLE_AFFINITY default=false DISABLE_STEAL default=false STEAL_WAIT default=0 (sec) STEAL_WAIT_MAX default=10 (sec) : Wait min(STEAL_WAIT*2**n, STEAL_WAIT_MAX) sec for task steal. GFARM_BASEDIR default="/tmp" GFARM_PREFIX default="pwrake_$USER" GFARM_SUBDIR default='/' MAX_GFWHERE_WORKER default=8
-
gfwhere-pipe
script (included in Pwrake) is used for file-affinity scheduling. This script requires Ruby/FFI (https://github.com/ffi/ffi). Install FFI bygem install ffi
-
Compile and Install METIS 5.1.0 (http://www.cs.umn.edu/~metis/). This requires CMake.
-
Install RbMetis (https://github.com/masa16/rbmetis) by
gem install rbmetis -- \ --with-metis-include=/usr/local/include \ --with-metis-lib=/usr/local/lib
- Ruby 2.1.4
- Rake 10.1.0
- CentOS 6.4
This work is supported by
- JST CREST, research area: "Development of System Software Technologies for Post-Peta Scale High Performance Computing," and
- MEXT Promotion of Research for Next Generation IT Infrastructure "Resources Linkage for e-Science (RENKEI)."