Difficulty inputting multiple files into the same pipeline.
jsolvason opened this issue · comments
Hi there,
I am attempting to run multiple samples of a time-course study through a single bpipe pipeline. I have gone through your parallel tasks guide, but some questions still remain.
My desired ultimate pipeline would be something like:
Bpipe.run { "_S%_" * [ cutadapt + hisat2 + stringtie ] + ballgown }
I require this functionality because ballgown will be performing differential analysis of multiple samples, meaning I need to input multiple .bam files into ballgown while cutadapt/hisat2/stringtie only require 1 input file (rather, one R1/R2)
The simplest time course being:
H0_S1_R1.fastq
H0_S1_R2.fastq
H48_S2_R1.fastq
H48_S2_R2.fastq
I have attempted the following:
// Trim adapter sequences
// No quality trimming added yet
cutadapt = {
transform('_S%_*.fastq') to('.trim.fastq') {
exec """
$CUTADAPT -o $output1.trim.fastq -p $output2.trim.fastq
-a $R1_THREE_PRIME_ADAPTER_SEQUENCE -A $R2_THREE_PRIME_ADAPTER_SEQUENCE
-m 13
--cores=$threads
$input1.fastq $input2.fastq
> ${input1}.processing
"""
doc title:"Fastq adapter trimming"
}
}
// Align reads to reference genome
hisat2 = {
transform('*.trim.fastq') to('.alignconc.sam','.unaligned.sam','.ss.txt') {
exec """
$HISAT2 -x ${FILES}${REF_GENOME}
-1 $input1.trim.fastq -2 $input2.trim.fastq
-S $output.alignconc.sam
--no-mixed
--no-discordant
--un-conc $output.unaligned.sam
--rna-strandness FR
--novel-splicesite-outfile $output.ss.txt
-I $MIN_FRAGMENT -X $MAX_FRAGMENT
--threads $threads
> ${input1}.processing
"""
}
}
Bpipe.run {
"_S%_" * [ cutadapt + hisat2 ]
}
The following error is raised:
====================================================================================================
| Starting Pipeline at 2018-03-09 00:50 |
====================================================================================================
========================================== Stage cutadapt ==========================================
ERROR: Expected output file 48-MiSeq_S5_L001_R1_001.trim.fastq could not be found
========================================= Pipeline Failed ==========================================
Expected output file 48-MiSeq_S5_L001_R1_001.trim.fastq could not be found
Use 'bpipe errors' to see output from failed commands.
Under inspection, the commandlog.txt shows that only the first sample was run through the cutadapt + hisat2 pipeline:
####################################################################################################
# Starting pipeline at Fri Mar 09 00:49:31 UTC 2018
# Input files: null
# Output Log: .bpipe/logs/4341.log
# Stage cutadapt
# ################ Finished at Fri Mar 09 00:49:32 UTC 2018 Duration = 0.449 seconds #################
####################################################################################################
# Starting pipeline at Fri Mar 09 00:50:40 UTC 2018
# Input files: [0-MiSeq_S8_L001_R1_001.fastq, 0-MiSeq_S8_L001_R2_001.fastq, 48-MiSeq_S5_L001_R1_001.fastq, 48-MiSeq_S5_L001_R2_001.fastq]
# Output Log: .bpipe/logs/4420.log
# Stage cutadapt
cutadapt -o 0-MiSeq_S8_L001_R1_001.trim.fastq -p 0-MiSeq_S8_L001_R2_001.trim.fastq -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -m 13 --cores=33 0-MiSeq_S8_L001_R1_001.fastq 0-MiSeq_S8_L001_R2_001.fastq > 0-MiSeq_S8_L001_R1_001.fastq.processing
# ################ Finished at Fri Mar 09 00:51:10 UTC 2018 Duration = 29.572 seconds ################
Any suggestions?
Joe