Nxfvars: Parameterize Notebooks from Nextflow
Nxfvars makes it easy to parameterize Jupyter notebooks, Rmarkdown notebooks, or plain
Python scripts from a Nextflow process. All variables accessible in
a process's script
section are made available directly in the notebook.
Using nxfvars in a Nextflow pipeline
Download nxfvars.nf and add the script to your pipeline. Import the nxfvars function and call it from the script section of your process:
nextflow.enable.dsl = 2
include { nxfvars } from "./nxfvars.nf"
process foo {
script:
"""
${nxfvars(task)}
# run script or execute notebook here
"""
}
When the process is executed, nxfvars generates a .params.yml
file
in the work directory. It contains all variables that can be accessed in the script
section. The YAML-file can be consumed by the nxfvars Python library,
Papermill,
or any YAML parser (see below).
Usage with the nxfvars Python library
Full examples at examples/nxfvars_python_script and examples/nxfvars_python_notebook.
The nxfvars Python library is a thin wrapper around a YAML parser. It may be used from both Jupyter notebooks or plain Python scripts. You can install it using pip:
pip install nxfvars
In python, nextflow variables can be accessed through the nxfvars
object:
from nxfvars import nxfvars
print(nxfvars["foo"])
print(nxfvars["params"]["bar"])
print(nxfvars["task"]["cpus"])
It is common to execute notebooks interactively during development and run them later
with parameters. In that case you can use .get()
to obtain default values,
when a .params.yml
is not yet present
nxfvars.get("foo", "default value for development")
From nextflow, just invoke the python script, or use e.g. jupyter nbconvert
to
execute the notebook.
nxfvars execute
is a convenient wrapper around jupytext
and jupyter nbconvert to execute and
convert arbitrary jupytext notebook formats to a html report.
process nxfvars_python {
script:
"""
${nxfvars(task)}
# simply execute the script here
python my_script.py
# or execute the notebook
nxfvars execute notebook.ipynb report.html
"""
}
Usage with Papermill
Full example at examples/papermill
Papermill is an established library for parameterizing jupyter notebooks. It can readily consume yaml files generated with nxfvars.
process papermill {
output:
file("report.html), emit: report
script:
"""
${nxfvars(task)}
papermill some_notebook.ipynb notebook_executed.ipynb -f .params.yml -k python3
# optional: convert to HTML report
jupyter nbconvert --to html --output report.html notebook_executed.ipynb
"""
}
Usage with Rmarkdown
Full example at examples/rmarkdown
For now, we use the following R snippet (render.R
) to parse the yaml file and
render the notebook with rmarkdown
. This could be facilitated in the future by
porting the nxfvars library to R.
# USAGE: render.R notebook.Rmd report.html
args = commandArgs(trailingOnly=TRUE)
nxfvars = list(nxfvars = yaml::read_yaml('.params.yml'))
rmarkdown::render(args[1], params = nxfvars, output_file=args[2])
process rmarkdown {
stageInMode "copy" // work around https://github.com/rstudio/rmarkdown/issues/1508
output:
file("report.html"), emit: report
script:
"""
${nxfvars(task)}
render.R 'notebook.Rmd' 'report.html'
"""
}
How it works
All variables in a nextflow process (except local variables declared with def
) can be
programmatically accessed through Nextflow's implicit variables this
and task
.
See also my blog post
about these variables.
The nxvfars(task)
function encodes all variables as YAML and injects them into the
bash script.