transform() with files outside of current directory
spco opened this issue · comments
If I use the following script:
setup = {
forward "/path/to/data/BC19/BC19_trimmed_aln_PE.sam"
}
samtools_view = {
// Convert file from SAM to BAM format
println input
transform("bam"){
exec "samtools view -b $input > $output"
}
println output
}
Bpipe.run { setup + samtools_view }
I get the following result:
[>]$ bpipe-0.9.9.5/bin/bpipe al.pipe
====================================================================================================
| Starting Pipeline at 2018-04-30 17:19 |
====================================================================================================
=========================================== Stage setup ============================================
======================================= Stage samtools_view ========================================
/path/to/data/BC19/BC19_trimmed_aln_PE.sam
BC19_trimmed_aln_PE.bam
======================================== Pipeline Succeeded ========================================
17:19:34 MSG: Finished at Mon Apr 30 17:19:34 BST 2018
This isn't the desired behaviour - I would want the output of samtools_view
to be in /path/to/data/BC19/BC19_trimmed_aln_PE.bam
. Is there a way to use the transform
keyword to do this, or do I need to revert to something like
exec "samtools view -b $input.bam > $input.sam"
?
If I'm understanding correctly, what you want is for the BAM file to get created in the same directory as the SAM, even though that's a different location from where you are running your pipeline?
Bpipe always has in mind a contextual "output directory" where outputs are going to be directed to. By default that is the current directory where you run the pipeline, but you can set it to point to other places by setting the output.dir
property. If it's a fixed directory that is very simple:
samtools_view = {
output.dir = "/path/to/data/BC19" // <==== here we can set where outputs go
// Convert file from SAM to BAM format
println input
transform("bam"){
exec "samtools view -b $input > $output"
}
println output
}
If you want to make a general command whose outputs will always go where the inputs came from,
you can do it with just a little groovy code to figure out the location:
samtools_view = {
// Figure out the parent directory of the input and set it
// as our output directory
output.dir = file(input).absoluteFile.parentFile.absolutePath
// Convert file from SAM to BAM format
println input
transform("bam"){
exec "samtools view -b $input > $output"
}
println output
}
Just as an aside: the reason Bpipe works this way is to try and provide isolation between any files coming in as input (raw data) and files that are products of the pipeline. So you can put your inputs in a different directory and run your pipeline and if you are not happy with the result, it is safe to just delete the whole directory because you know everything in there is derived from the input data.