Cellenics Pipeline

The Cellenics pipeline project for dependency-managed work processing.

Getting started

The steps of the pipeline that are run through this project are started spontaneously on your machine as Docker containers, simulating Kubernetes in local development.

We have included a utility so you can automatically monitor containers spawned and read their logs as they are executing.

For local development, you should already have Docker and Node.js installed, as well as Inframock running.

Afterwards, you can install the pipeline dependencies with:

make install

To build and run the pipeline:

make build && make run

A similar message should appear:

> node src/app.js

Loading CloudFormation for local container launcher...
Creating mock Lambda function on InfraMock...
No previous stack found on InfraMock.
Stack with ARN arn:aws:cloudformation:eu-west-1:000000000000:stack/local-container-launcher/106d1df9 successfully created.
Waiting for Docker events...

Logs from pipelines run through the API will apear here.

Rebuiling the docker images

make build

Local development and adding dependencies

First make sure the project library is synchronized with the lockfile:

# inside pipeline-runner folder
renv::restore()

NOTE: To restore Bioconductor packages your R version needs to be the same as in the Dockerfile (4.0.5).

install.packages(...) and use them (e.g. dplyr::left_join(...)) as you normally would. Then, update the lockfile:

renv::snapshot()

commit the changes to the lockfile (used to install dependencies in the Dockerfile). See renv docs for more info.

Debugging locally

TLDR: save something inside /debug in a data processing or gem2s step to access it later from ./local-runner/debug.

TLDR2: if the pipeline throws an error, tryCatchLog will save a dump file in ./local-runner/debug that can be used for inspecting the workspace and object values along the call stack.

To save the parameters (config, seurat_obj, etc) to a data processing task function, specify DEBUG_STEP. Available tasks include all task names listed in run_processing_step init.R as well as DEBUG_STEP=all to save the parameters to all data processing task functions:

# e.g. DEBUG_STEP=dataIntegration
DEBUG_STEP=task_name make run

When the pipeline is run, it will save the parameters to the specified task_name in $(pwd)/debug. You can load these into your R environment:

# clicking the file in RStudio does this for you
load('{task_name}_{sample_id}.RData')

# if you need to load multiple tasks, you can load each into a seperate environment
# you would when access objects using e.g. task_env$scdata
task_env <- new.env()
load('{task_name}_{sample_id}.RData', envir = task_env)

Kristian-A / pipeline

Cellenics Pipeline

Getting started

Rebuiling the docker images

Local development and adding dependencies

Debugging locally

About

Languages