snakemake-workflows / docs

Documentation of the Snakemake-Workflows project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Best-paractice of cross-workflow specification of files

SilasK opened this issue · comments

I would like to discuss what is the best way to specify files in a way that they can be used across workflows.

Take the example of two workflows e.g

Workflow 1: reads --> assembly

Workflow 2: assembly + reads --> assembly statistics ...

What is the best way to specify the reads and assembly so that they can be used by different workflows?
Take into account that
Requirement A: The reads might be used at multiple places in Workflow 2.
Requirement B : The reads are probably to be used to infer the total number of samples in the target rule.

With sub-workflows, it would be possible to define otherworkflow(file)

But I think the recommended way now is to use modules and to import the rules Workflow 1 and 2 in a new workflow.
But then I should know which rules I need to modify to adapt the file specification. This should be necessarily defined in the Readme of a workflow.

I don't see how this can be done without massive modifying many rules of an imported workflow.

Any thoughts?

Here's a first attempt:

Workflow 1 input reads are determined by YAML configuration file, and the final assembly file is tagged either in its contents e.g. header lines, or filename; with a hash representing the input reads used to generate it e.g. hash of read hashes.

Workflow 2 takes input reads and input assembly also by YAML configuration file. It checks either on each run or through a dummy output that the input assembly's information about which input reads were used to generate it matches with the set of input reads it was given.

Your idea would be to define the path to the files

Something like:

config.yam

read_file_format: "QC/qc_reads/{sample}_{fraction}.fastq.gz"
assembly_file_format: "Assembly/assemblies/{sample}.fasta.gz"

One could also use a tsv file in which we will specify the headers in a config file.

Ideally using the https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#configuring-scientific-experiments-via-peps