snakemake-workflows / rna-seq-star-deseq2

RNA-seq workflow using STAR and DESeq2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

trim.smk file name

neavemj opened this issue · comments

Good afternoon, thanks for the workflow!

I'm running into an issue with running the pipeline with trimming enabled. I'm getting the following error:

Missing input files for rule align:
results/trimmed/21_02764_01_S1_lane1_R2.fastq.gz
results/trimmed/21_02764_01_S1_lane1_R1.fastq.gz

After reading through the snake files, I think it is because the output from trim.smk has a slightly different name than expected by align.smk. For example:

trim.smk, line 29: fastq1="results/trimmed/{sample}-{unit}_R1.fastq.gz",
common.smk, line 98: "results/trimmed/{sample}_{unit}_{group}.fastq.gz",

Note that the first line has wildcards separated by a hyphen, while the second are separated by an underscore.

Could this be causing the 'missing input file' error?

Thanks again!

Matt.

commented

I get the same error when I enable trimming. Everything runs fine when it is not enabled.
At the moment on line 98, it reads:
"results/trimmed/{sample}_{unit}_{group}.fastq.gz",

which I changed to:
"results/trimmed/{sample}-{unit}_{group}.fastq.gz",

The first underscore becomes a hyphen/dash, which matches what is in trim.smk. A very similar issue to what Matt/neavemj described. Thanks Matt!

Thanks for so clearly identifying this problem, @neavemj. Would you like to suggest the respective needed change in a pull request, so we can document this as your contribution to the code base? Or would you prefer that I quickly implement the fix to the code base with an acknowledgement to this issue right here?

My suggestion would be to switch to all underscores in the file names. If this is always the default, this should help avoid further similar mixups...

Finally closed by #61 .