ssadedin / bpipe

Bpipe - a tool for running and managing bioinformatics pipelines

Home Page:http://docs.bpipe.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

transform() with files outside of current directory

spco opened this issue · comments

If I use the following script:

setup = {
	forward "/path/to/data/BC19/BC19_trimmed_aln_PE.sam"
}

samtools_view = {
        // Convert file from SAM to BAM format
        println input
        transform("bam"){
                exec "samtools view -b $input > $output"
        }
	println output
}

Bpipe.run { setup + samtools_view }

I get the following result:

[>]$ bpipe-0.9.9.5/bin/bpipe al.pipe 
====================================================================================================
|                              Starting Pipeline at 2018-04-30 17:19                               |
====================================================================================================

=========================================== Stage setup ============================================

======================================= Stage samtools_view ========================================
/path/to/data/BC19/BC19_trimmed_aln_PE.sam
BC19_trimmed_aln_PE.bam

======================================== Pipeline Succeeded ========================================
17:19:34 MSG:  Finished at Mon Apr 30 17:19:34 BST 2018

This isn't the desired behaviour - I would want the output of samtools_view to be in /path/to/data/BC19/BC19_trimmed_aln_PE.bam. Is there a way to use the transform keyword to do this, or do I need to revert to something like

exec "samtools view -b $input.bam > $input.sam"

?

If I'm understanding correctly, what you want is for the BAM file to get created in the same directory as the SAM, even though that's a different location from where you are running your pipeline?

Bpipe always has in mind a contextual "output directory" where outputs are going to be directed to. By default that is the current directory where you run the pipeline, but you can set it to point to other places by setting the output.dir property. If it's a fixed directory that is very simple:

samtools_view = {

        output.dir = "/path/to/data/BC19" // <==== here we can set where outputs go

        // Convert file from SAM to BAM format
        println input
        transform("bam"){
                exec "samtools view -b $input > $output"
        }
	println output
}

If you want to make a general command whose outputs will always go where the inputs came from,
you can do it with just a little groovy code to figure out the location:

samtools_view = {

        // Figure out the parent directory of the input and set it
        // as our output directory
        output.dir = file(input).absoluteFile.parentFile.absolutePath

        // Convert file from SAM to BAM format
        println input
        transform("bam"){
                exec "samtools view -b $input > $output"
        }
	println output
}

Just as an aside: the reason Bpipe works this way is to try and provide isolation between any files coming in as input (raw data) and files that are products of the pipeline. So you can put your inputs in a different directory and run your pipeline and if you are not happy with the result, it is safe to just delete the whole directory because you know everything in there is derived from the input data.

Thanks @ssadedin for the clear explanation, that makes sense.