metacal-pipeline

CWL based metacalibration pipeline to post-process DM outputs.

Content of this repository

Docker container for the pipeline

In order to make the pipeline portable, all the necessary tools are neatly packaged into a single container which will be used by the pipeline engine to execute the workflow. This container contains in particular the DM stack, ngmix, ngmixer and medsdm.

The container is automatically generated by docker hub upon changes to the master branch: https://hub.docker.com/r/eiffl/metacal

You can manually pull the container, otherwise it will automagically be downloaded by the workflow engine upon execution.

CWL description of pipeline stages and workflow

Each pipeline stage is described by a CWL file, which are stored in the tools directory. For instance the tool that extracts MEDS files from the DM stack is described in medsdm.cwl. These CWL files only describe the inputs and outputs of a tool, as well as the container needed to run these tools.

Using these tool definitions, the entire workflow is described by a main CWL workflow file metacal-wf.cwl. It can be visualized for instance using the online CWLViewer: https://view.commonwl.org/workflows/github.com/EiffL/metacal-pipeline/blob/master/tools/metacal-wf.cwl

Configuration files

All the configurations for the workflow can be found in the config folder, in particular:

medsdm_config.yaml: Configuration of the medsdm tool that extracts MEDS files from the DM stack
run-dbcoadd-cm-001.yaml: NGMIXER configuration file that describes how the MEDS files are being processed.
metacal-wf-testing.yml: Workflow configuration that specifies the input files and other workflow configs and parameters. This testing config is restricted to a subset of tract, patches for testing purposes

How to run the pipeline on Cori

The workflow to produce the shape catalog is described using the Common Workflow Language (CWL) and executed using a CWL runner, called cwl-parsl, based on the Parsl library and cwltool, the reference CWL implementation.

To install cwl-parsl, do the following from cori login:

$ source /global/common/software/lsst/common/miniconda/setup_current_python.sh
$ pip install --user git+https://github.com/EiffL/cwl-parsl

Easy enough... The first line is there to load the shared DESC environment.

Now that cwl-parsl is installed, you can go ahead and clone this repo:

$ git clone git@github.com:EiffL/metacal-pipeline.git
$ cd metacal-pipeline

cwl-parsl provides the outdir, cachedir, and basedir options which define respectively:

the output directory of the final workflow products
a caching directory where each step will be cached to recover from a failure
the base working directory (to stage input/outputs for each pipeline step)

To run the metacalibration pipeline, use a command line similar to the following, adapting the paths the different folders:

$ cwlparsl --parsl cori --shifter \
  --outdir=/global/cscratch1/sd/flanusse/metacal-pipeline \
  --cachedir=/global/cscratch1/sd/flanusse/workdir/cache/ \
  --basedir=/global/cscratch1/sd/flanusse/workdir/ \
  tools/metacal-wf.cwl config/metacal-wf-testing.yml

The --parsl cori tells cwl-parsl to run parsl on the cori slurm system, note that you can also use --parsl cori-debug, which will be use the debug queue instead (limited to 30 mins, but with faster access).

Due to a Parsl bug (Parsl/parsl#271), using cwl-parsl like this from the login node might leave some zombie processes behind, be aware of this and make sure to check for lingering ippcontroler processes after the workflow completes. The following alternative method is safer, but doesn't scale as much.

For debugging purposes, it's easier (and safer) to use an interactive session, like this:

$ salloc -N 1 -q interactive -C haswell -t03:00:00 -L SCRATCH
$ cwlparsl --shifter \
  --outdir=/global/cscratch1/sd/flanusse/metacal-pipeline \
  --cachedir=/global/cscratch1/sd/flanusse/workdir/cache/ \
  --basedir=/global/cscratch1/sd/flanusse/workdir/ \
  tools/metacal-wf.cwl config/metacal-wf-testing.yml

Note the missing --parsl flag, which means that the workflow will use a local thread executor on the interactive node, instead of submiting jobs to slurm.

EiffL / metacal-pipeline