grst / nxfvars

Access nextflow variables from python scripts or notebooks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nxfvars: Parameterize Notebooks from Nextflow

Running the examples with pytest-workflow CI test for the nxfvars python library

Nxfvars makes it easy to parameterize Jupyter notebooks, Rmarkdown notebooks, or plain Python scripts from a Nextflow process. All variables accessible in a process's script section are made available directly in the notebook.

Using nxfvars in a Nextflow pipeline

Download nxfvars.nf and add the script to your pipeline. Import the nxfvars function and call it from the script section of your process:

nextflow.enable.dsl = 2
include { nxfvars } from "./nxfvars.nf"

process foo {
    script:
    """
    ${nxfvars(task)}

    # run script or execute notebook here
    """
}

When the process is executed, nxfvars generates a .params.yml file in the work directory. It contains all variables that can be accessed in the script section. The YAML-file can be consumed by the nxfvars Python library, Papermill, or any YAML parser (see below).

Usage with the nxfvars Python library

Full examples at examples/nxfvars_python_script and examples/nxfvars_python_notebook.

The nxfvars Python library is a thin wrapper around a YAML parser. It may be used from both Jupyter notebooks or plain Python scripts. You can install it using pip:

pip install nxfvars

In python, nextflow variables can be accessed through the nxfvars object:

from nxfvars import nxfvars

print(nxfvars["foo"])
print(nxfvars["params"]["bar"])
print(nxfvars["task"]["cpus"])

It is common to execute notebooks interactively during development and run them later with parameters. In that case you can use .get() to obtain default values, when a .params.yml is not yet present

nxfvars.get("foo", "default value for development")

From nextflow, just invoke the python script, or use e.g. jupyter nbconvert to execute the notebook.

nxfvars execute is a convenient wrapper around jupytext and jupyter nbconvert to execute and convert arbitrary jupytext notebook formats to a html report.

process nxfvars_python {
    script:
    """
    ${nxfvars(task)}

    # simply execute the script here
    python my_script.py
    # or execute the notebook
    nxfvars execute notebook.ipynb report.html
    """
}

Usage with Papermill

Full example at examples/papermill

Papermill is an established library for parameterizing jupyter notebooks. It can readily consume yaml files generated with nxfvars.

process papermill {

    output:
        file("report.html), emit: report

    script:
    """
    ${nxfvars(task)}

    papermill some_notebook.ipynb notebook_executed.ipynb -f .params.yml -k python3
    # optional: convert to HTML report
    jupyter nbconvert --to html --output report.html notebook_executed.ipynb
    """
}

Usage with Rmarkdown

Full example at examples/rmarkdown

For now, we use the following R snippet (render.R) to parse the yaml file and render the notebook with rmarkdown. This could be facilitated in the future by porting the nxfvars library to R.

# USAGE: render.R notebook.Rmd report.html
args = commandArgs(trailingOnly=TRUE)
nxfvars = list(nxfvars = yaml::read_yaml('.params.yml'))
rmarkdown::render(args[1], params = nxfvars, output_file=args[2])
process rmarkdown {
    stageInMode "copy" // work around https://github.com/rstudio/rmarkdown/issues/1508
    output:
        file("report.html"), emit: report

    script:
    """
    ${nxfvars(task)}

    render.R 'notebook.Rmd' 'report.html'
    """
}

How it works

All variables in a nextflow process (except local variables declared with def) can be programmatically accessed through Nextflow's implicit variables this and task. See also my blog post about these variables.

The nxvfars(task) function encodes all variables as YAML and injects them into the bash script.

About

Access nextflow variables from python scripts or notebooks

License:MIT License


Languages

Language:Python 74.3%Language:Nextflow 25.7%