ssadedin / bpipe

Bpipe - a tool for running and managing bioinformatics pipelines

Home Page:http://docs.bpipe.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Difficulty inputting multiple files into the same pipeline.

jsolvason opened this issue · comments

Hi there,

I am attempting to run multiple samples of a time-course study through a single bpipe pipeline. I have gone through your parallel tasks guide, but some questions still remain.

My desired ultimate pipeline would be something like:

Bpipe.run { "_S%_" * [ cutadapt + hisat2 + stringtie ] + ballgown }

I require this functionality because ballgown will be performing differential analysis of multiple samples, meaning I need to input multiple .bam files into ballgown while cutadapt/hisat2/stringtie only require 1 input file (rather, one R1/R2)

The simplest time course being:

H0_S1_R1.fastq
H0_S1_R2.fastq
H48_S2_R1.fastq
H48_S2_R2.fastq

I have attempted the following:

// Trim adapter sequences
// No quality trimming added yet
cutadapt = {
	transform('_S%_*.fastq') to('.trim.fastq') {
		exec """
			$CUTADAPT -o $output1.trim.fastq -p $output2.trim.fastq 
			-a $R1_THREE_PRIME_ADAPTER_SEQUENCE -A $R2_THREE_PRIME_ADAPTER_SEQUENCE 
			-m 13 
			--cores=$threads
			$input1.fastq $input2.fastq 
			> ${input1}.processing
		"""
		doc title:"Fastq adapter trimming"
	}
}

// Align reads to reference genome
hisat2 = {
	transform('*.trim.fastq') to('.alignconc.sam','.unaligned.sam','.ss.txt') {
		exec """
			$HISAT2 -x ${FILES}${REF_GENOME} 
			-1 $input1.trim.fastq -2 $input2.trim.fastq 
			-S $output.alignconc.sam
			--no-mixed 
			--no-discordant
			--un-conc $output.unaligned.sam
			--rna-strandness FR
			--novel-splicesite-outfile $output.ss.txt 
			-I $MIN_FRAGMENT -X $MAX_FRAGMENT 
			--threads $threads
			> ${input1}.processing
		"""
	}
}

Bpipe.run {
	"_S%_" * [ cutadapt + hisat2 ] 
}

The following error is raised:

====================================================================================================
|                              Starting Pipeline at 2018-03-09 00:50                               |
====================================================================================================

========================================== Stage cutadapt ==========================================
ERROR: Expected output file 48-MiSeq_S5_L001_R1_001.trim.fastq could not be found 


========================================= Pipeline Failed ==========================================

Expected output file 48-MiSeq_S5_L001_R1_001.trim.fastq could not be found

Use 'bpipe errors' to see output from failed commands.

Under inspection, the commandlog.txt shows that only the first sample was run through the cutadapt + hisat2 pipeline:

####################################################################################################
# Starting pipeline at Fri Mar 09 00:49:31 UTC 2018
# Input files:  null
# Output Log:  .bpipe/logs/4341.log
# Stage cutadapt
# ################ Finished at Fri Mar 09 00:49:32 UTC 2018 Duration = 0.449 seconds #################


####################################################################################################
# Starting pipeline at Fri Mar 09 00:50:40 UTC 2018
# Input files:  [0-MiSeq_S8_L001_R1_001.fastq, 0-MiSeq_S8_L001_R2_001.fastq, 48-MiSeq_S5_L001_R1_001.fastq, 48-MiSeq_S5_L001_R2_001.fastq]
# Output Log:  .bpipe/logs/4420.log
# Stage cutadapt
cutadapt -o 0-MiSeq_S8_L001_R1_001.trim.fastq -p 0-MiSeq_S8_L001_R2_001.trim.fastq  -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT  -m 13  --cores=33 0-MiSeq_S8_L001_R1_001.fastq 0-MiSeq_S8_L001_R2_001.fastq  > 0-MiSeq_S8_L001_R1_001.fastq.processing
# ################ Finished at Fri Mar 09 00:51:10 UTC 2018 Duration = 29.572 seconds ################

Any suggestions?

Joe