Swarm is a script designed to simplify submitting a group of commands to the Biowulf cluster.
This version has been forked from the original NIH one, to run at Pawsey Supercomputing Centre.
Suppose inyour work directory you have a text file called list
, that contains a long list of serial commands to be executed through Slurm.
- To run them as one job/one core per command type:
swarm -f list
- To run as job packs, to fill entire nodes of the cluster, run:
swarm -f list -p auto
You can set memory and thread requirements using -g
and -t
, respectively. Multi-threaded swarms are currently not compatible with multi-packed swarms.
Run swarm -h
for additional options.
The swarm
script accepts a number of input parameters along with a file containing a list of commands that otherwise would be run on the command line. swarm
parses the commands from this file and writes them to a series of command scripts. Then a single batch script is written to execute these command scripts in a slurm job array, using the user's inputs to guide how slurm allocates resources.
Packing a swarm means running multiple commands per subjob in parallel, by allocating multiple cores and running one command per core. This can be quite useful when running on systems (such as Magnus at Pawsey Supercomputing Centre) where the minimum job allocation is an entire node; packing allows to maximise usage of hardware resources.
Bundling a swarm means running two or more commands per subjob serially, uniformly. For example, if there are 1000 commands in a swarm, and the bundle factor is 5, then each subjob will run 5 commands serially, resulting in 200 subjobs in the job array.
Folding a swarm means running commands serially in a subjob only if the number of subjobs exceed the maximum array size (maxarraysize
= 1000). This is a new concept. Previously, if a swarm exceeded the maxarraysize
, then either the swarm would fail, or the swarm would be autobundled until it fit within the maxarraysize
number of subjobs. Thus, if a swarm had 1001 commands, it would be autobundled with a bundle factor of 2 into 500 subjobs, with 2 commands serially each. With folding, this would still result in 1000 subjobs, but one subjob would have 2 serial commands, while the rest have 1.
swarm
writes everything in your user-specific scratch directory:
/scratch/$PAWSEY_PROJECT/$USER/swarm_$PAWSEY_CLUSTER
├── 4506756 -> YMaPNXtqEF
└── YMaPNXtqEF
├── cmd.0
├── cmd.1
├── cmd.2
├── cmd.3
└── swarm.batch
swarm
(running as the user) first creates a subdirectory within the user's directory with a completely random name. The command scripts are named cmd.#
, with #
being the command index within the job array. The batch script is simply named swarm.batch
. All of these are written into the temporary subdirectory.
The batch script swarm.batch
hard-codes the path to the temporary subdirectory as the location of the command scripts. This allows the swarm to be rerun, albeit with the same sbatch options.
The module function is initialized and modules are loaded in the batch script. This limits the number of times `module load is called to once per swarm, but it also means that the user could overrule the environment within the swarm commands.
When a swarm job is successfully submitted to slurm, a jobid is obtained, and a symlink is created that points to the temporary directory. This allows for simple identification of swarm array jobs running on the cluster.
If a submission fails, then no symlink will be created.
When a user runs swarm in development mode (--devel
), no temporary directory or files are created.
Swarm has several options for testing things.
--devel:
This option prevents swarm
from creating command or batch scripts, prevents it from actually submitting to sbatch, and prevents it from logging to the standard logfile. It also increases the verbosity level of swarm
.
--verbose:
This option makes swarm
more chatty, and accepts an integer from between 0 (silent) and 4. Running a swarm with many commands at level 4 will give a lot of output, so beware.
--debug:
This option is similar to --devel
, except that the scripts are actually created. The temporary directory for the swarm.batch
and command scripts begins with dev
, rather than tmp
like normal.
--no-run:
A hidden alacarte option, prevents swarm
from actually submitting to sbatch.
--no-log:
A hidden alacarte option, prevents swarm
from logging.
--logfile:
A hidden alacarte option, redirects the logfile from the standard logfile to one of your choice.
--no-scripts:
Don't create command and batch scripts.
swarm
logs to$MYSCRATCH/swarm_$PAWSEY_CLUSTER/logs/swarm.log
swarm_cleanup.pl
logs to$MYSCRATCH/swarm_$PAWSEY_CLUSTER/logs/swarm_cleanup.log
An index file $MYSCRATCH/swarm_$PAWSEY_CLUSTER/logs/swarm_tempdir.idx
is updated when a swarm is created. This file contains the creation timestamp, user, unique tag, number of commands, and P value (either 1 or 2):
1509019983,mmouse,e4gLIFwqhq,1,1
1509020005,mmouse,aFwi3QYiQ0,13,1
1509020213,dduck2,jqcJTSiIBH,3,1
1509020215,dduck,qqBMb2SLzl,1,1
1509020225,ggoofy,64PZ3h80nB,1000,1