haught / matlab-slurm

Modify Matlab R2010+ LSF distcomp shared scripts for SLURM scheduler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Copyright 2010-2013 The MathWorks, Inc.

This folder contains a number of files to allow Parallel Computing Toolbox
to be used with SLURM via the generic cluster interface.

The files in this folder assume that the client and cluster share a file system
and that the client is able to submit directly to the cluster using the
sbatch command.

Note that all the files in this directory will work only for clusters that are
running on UNIX.

Instructions for Use
=====================


On the Client Host
------------------
1. The files in
$MATLABROOT/toolbox/distcomp/examples/integration/slurm/shared
must be present on the MATLAB path. Copy them to $MATLABROOT/toolbox/local or modify
the MATLAB path from within MATLAB.


2. Read the documentation for using the generic cluster interface with
the Parallel Computing Toolbox and familiarize yourself with the different
properties that can be set for a generic cluster.

In the MATLAB Client
--------------------
1. Create a generic cluster object for your cluster.  For independent jobs,
you must use independentSubmitFcn as your submit function.  For communicating jobs, you must
use communicatingSubmitFcn as your submit function.


Example:
% Use a folder that both the client and cluster can access
% as the JobStorageLocation.  If your cluster and client use
% different operating systems, you should specify JobStorageLocation
% to be a structure.  Refer to the documentation on
% generic cluster for more information.
cluster = parallel.cluster.Generic('JobStorageLocation', '/home/JOB_STORAGE_LOCATION');
set(cluster, 'HasSharedFilesystem', true);
set(cluster, 'ClusterMatlabRoot', '/apps/matlab');
set(cluster, 'OperatingSystem', 'unix');

set(cluster, 'IndependentSubmitFcn', @independentSubmitFcn);
% If you want to run communicating jobs (including parallel pools), you must specify a CommunicatingSubmitFcn
set(cluster, 'CommunicatingSubmitFcn', @communicatingSubmitFcn);
set(cluster, 'GetJobStateFcn', @getJobStateFcn);
set(cluster, 'DeleteJobFcn', @deleteJobFcn);


2. Create a job and some tasks, submit the job, and wait for it to finish before
getting the results. Do the same for communicating jobs if so desired.



As an alternative to these steps, create a profile that defines the appropriate
properties and run profile validation to verify that the profile
works correctly.


Description of Files
====================
For more detail about these files, please refer to the help and comments contained in the
files themselves.

MATLAB Functions Required for generic cluster
----------------------------------------------
independentSubmitFcn.m
    Submit function for independent jobs.  Use this as the IndependentSubmitFcn for your generic cluster object.
communicatingSubmitFcn.m
    Submit function for communicating jobs.  Use this as the CommunicatingSubmitFcn for your generic cluster object.
deleteJobFcn.m
    Delete a job on the cluster.  Use this as the DeleteJobFcn for your generic cluster object.
getJobStateFcn.m
    Get the job state from the cluster.  Use this as the GetJobStateFcn for your generic cluster object.

Other MATLAB Functions
-----------------------
extractJobId.m
    Get the cluster's job ID from the submission output.
getSubmitString.m
    Get the submission string for the cluster.


Executable Scripts
-------------------
independentJobWrapper.sh
    Script used by the cluster to launch the MATLAB worker processes for independent jobs.
communicatingJobWrapper.sh
    Script used by the cluster to launch the MATLAB worker processes for communicating jobs.


Optional Customizations
========================
The code customizations listed in this section are clearly marked in the relevant files.

independentSubmitFcn.m
----------------------
independentSubmitFcn provides the ability to supply additional submit arguments to the
sbatch command.  You may wish to modify the additionalSubmitArgs variable to include additional
submit arguments that are appropriate to your cluster.  For more information, refer to the
sbatch documentation provided with your cluster.

communicatingSubmitFcn.m
------------------------
communicatingSubmitFcn calculates the number of nodes to request from the cluster from the
NumWorkersRange property of the communicating job.  You may wish to customize the number of
nodes requested to suit your cluster's requirements.

communicatingSubmitFcn provides the ability to supply additional submit arguments to the
sbatch command.  You may wish to modify the additionalSubmitArgs variable to include additional
submit arguments that are appropriate to your cluster.  For more information, refer to the
sbatch documentation provided with your cluster.

communicatingJobWrapper.sh
--------------------------
communicatingJobWrapper.sh uses the StrictHostKeyChecking=no and UserKnownHostsFile=/dev/null options
for ssh.  You may wish to customize the ssh options to suit your cluster's requirements.  For
more information, refer to your operating system the ssh documentation.

About

Modify Matlab R2010+ LSF distcomp shared scripts for SLURM scheduler


Languages

Language:MATLAB 73.7%Language:Shell 26.3%