masa16 / pwrake1

Obsolete verion of Pwrake: Parallel workflow extension for Rake

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pwrake

Parallel workflow extension for Rake

  • Author: Masahiro Tanaka

(README in Japanese), (GitHub Repository)

Features

  • Parallelize all tasks; no need to modify Rakefile, no need to use multitask.
  • Tasks are executed in the given number of worker threads.
  • Remote exuecution using SSH.
  • Pwrake is an extension to Rake, not patch to Rake: Rake and Pwrake coexist.
  • High parallel I/O performance using Gfarm file system.

Installation

Download source tgz/zip and expand, cd to subdir and install:

$ ruby setup.rb

Or, gem install:

$ gem install pwrake

Usage

Parallel execution using 4 cores at localhost:

$ pwrake -j 4

Parallel execution using all cores at localhost:

$ pwrake -j

Parallel execution using total 2*2 cores at remote 2 hosts:

  1. Share your directory among remote hosts via distributed file system such as NFS, Gfarm.

  2. Allow passphrase-less access via SSH in either way:

    • Add passphrase-less key generated by ssh-keygen. (Be careful)
    • Add passphrase using ssh-add.
  3. Make hosts file in which remote host names and the number of cores are listed:

     $ cat hosts
     host1 2
     host2 2
    
  4. Run pwrake with an option --hostfile or -F:

     $ pwrake --hostfile=hosts
    

Options

Command line option

    -F, --hostfile FILE              [Pw] Read hostnames from FILE
    -j, --jobs [N]                   [Pw] Number of threads at localhost (default: # of processors)
    -L, --logfile [FILE]             [Pw] Write log to FILE
        --ssh-opt, --ssh-option OPTION
                                     [Pw] Option passed to SSH
        --filesystem FILESYSTEM      [Pw] Specify FILESYSTEM (nfs|gfarm)
        --gfarm                      [Pw] FILESYSTEM=gfarm
    -A, --disable-affinity           [Pw] Turn OFF affinity (AFFINITY=off)
    -S, --disable-steal              [Pw] Turn OFF task steal
    -d, --debug                      [Pw] Output Debug messages
        --pwrake-conf [FILE]         [Pw] Pwrake configuation file in YAML
        --show-conf, --show-config   [Pw] Show Pwrake configuration options
        --report LOG                 [Pw] Report profile HTML from LOG and exit.

pwrake_conf.yaml

  • If pwrake_conf.yaml exists at current directory, Pwrake reads options from it.

  • Example (in YAML form):

      HOSTFILE : hosts
      LOGFILE : true
      TASKLOG : true
      PROFILE : true
      GNU_TIME : true
      PLOT_PARALLELISM : true
      DISABLE_AFFINITY: true
      DISABLE_STEAL: true
      FAILED_TARGET : delete
      PASS_ENV :
       - ENV1
       - ENV2
    
  • Option list:

      HOSTFILE, HOSTS   default=false
      LOGFILE, LOG      default=none, string=filename, true="Pwrake%Y%m%d-%H%M%S_%$.log"
      TASKLOG           default=none, string=filename, true="Pwrake%Y%m%d-%H%M%S_%$.task"
      PROFILE           default=none, string=filename, true="Pwrake%Y%m%d-%H%M%S_%$.csv"
      WORK_DIR          default=$PWD
      FILESYSTEM        default=nil (autodetect)
      SSH_OPTION        (String) SSH option
      PASS_ENV          (Array) Environment variables passed to SSH
      GNU_TIME          If true, obtains PROFILEs using GNU time
      PLOT_PARALLELISM  If true, plot parallelism using GNUPLOT
      FAILED_TARGET     ( rename(default) | delete | leave ) failed files
      QUEUE_PRIORITY          RANK(default), FIFO, LIFO, LIHR
      NOACTION_QUEUE_PRIORITY FIFO(default), LIFO, RAND
      NUM_NOACTION_THREADS    default=4 when gfarm, else 1
      THREAD_CREATE_INTERVAL  default=0.01 (sec)
      HALT_QUEUE_WHILE_SEARCH true|false
      GRAPH_PARTITION         true|false
    

    for Gfarm system:

      DISABLE_AFFINITY  default=false
      DISABLE_STEAL     default=false
      STEAL_WAIT        default=0 (sec)
      STEAL_WAIT_MAX    default=10 (sec)
       : Wait min(STEAL_WAIT*2**n, STEAL_WAIT_MAX) sec for task steal.
      GFARM_BASEDIR     default="/tmp"
      GFARM_PREFIX      default="pwrake_$USER"
      GFARM_SUBDIR      default='/'
      MAX_GFWHERE_WORKER  default=8
    

Note for Gfarm

  • gfwhere-pipe script (included in Pwrake) is used for file-affinity scheduling. This script requires Ruby/FFI (https://github.com/ffi/ffi). Install FFI by

      gem install ffi
    

For Graph Partitioning

Tested Platform

  • Ruby 2.1.4
  • Rake 10.1.0
  • CentOS 6.4

Acknowledgment

This work is supported by

  • JST CREST, research area: "Development of System Software Technologies for Post-Peta Scale High Performance Computing," and
  • MEXT Promotion of Research for Next Generation IT Infrastructure "Resources Linkage for e-Science (RENKEI)."

About

Obsolete verion of Pwrake: Parallel workflow extension for Rake

License:MIT License


Languages

Language:Ruby 100.0%