nf-core / atacseq

ATAC-seq peak-calling and QC analysis pipeline

Home Page:https://nf-co.re/atacseq

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

question for constructing samplesheet

pseudacriscrucifer opened this issue · comments

Hi,

How would you recommend I assemble a samplesheet for nf-core/atacseq with the following example of input fastq.gz files.

12_B9_KOL4052A19_S2_L001_I1_001.fastq.gz
12_B9_KOL4052A19_S2_L001_R1_001.fastq.gz
12_B9_KOL4052A19_S2_L001_R2_001.fastq.gz
12_B9_KOL4052A19_S2_L001_R3_001.fastq.gz
12_B9_KOL4052A19_S2_L002_I1_001.fastq.gz
12_B9_KOL4052A19_S2_L002_R1_001.fastq.gz
12_B9_KOL4052A19_S2_L002_R2_001.fastq.gz
12_B9_KOL4052A19_S2_L002_R3_001.fastq.gz
13_B10_KOL4052A20_S4_L001_I1_001.fastq.gz
13_B10_KOL4052A20_S4_L001_R1_001.fastq.gz
13_B10_KOL4052A20_S4_L001_R2_001.fastq.gz
13_B10_KOL4052A20_S4_L001_R3_001.fastq.gz
13_B10_KOL4052A20_S4_L002_I1_001.fastq.gz
13_B10_KOL4052A20_S4_L002_R1_001.fastq.gz
13_B10_KOL4052A20_S4_L002_R2_001.fastq.gz
13_B10_KOL4052A20_S4_L002_R3_001.fastq.gz

This represents sequencing files from two samples (12_B9 and 13_B10), and is paired-end. Any help would be appreciated - I have tried numerous constructs for samplesheet.csv!

Depending on whether Rx refers to biological or technical replicates, your samplesheet would look something like

sample,fastq_1,fastq_2,replicate
12_B9,12_B9_KOL4052A19_S2_L001_R1_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R1_001.fastq.gz,1
12_B9,12_B9_KOL4052A19_S2_L001_R2_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R2_001.fastq.gz,2
12_B9,12_B9_KOL4052A19_S2_L001_R3_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R3_001.fastq.gz,3
13_b10,13_B10_KOL4052A20_S4_L001_R1_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R1_001.fastq.gz,1
13_b10,13_B10_KOL4052A20_S4_L001_R2_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R2_001.fastq.gz,2
13_b10,13_B10_KOL4052A20_S4_L001_R3_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R3_001.fastq.gz,3

or

sample,fastq_1,fastq_2,replicate
12_B9,12_B9_KOL4052A19_S2_L001_R1_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R1_001.fastq.gz,1
12_B9,12_B9_KOL4052A19_S2_L001_R2_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R2_001.fastq.gz,1
12_B9,12_B9_KOL4052A19_S2_L001_R3_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R3_001.fastq.gz,1
13_b10,13_B10_KOL4052A20_S4_L001_R1_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R1_001.fastq.gz,1
13_b10,13_B10_KOL4052A20_S4_L001_R2_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R2_001.fastq.gz,1
13_b10,13_B10_KOL4052A20_S4_L001_R3_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R3_001.fastq.gz,1

In the latter case (technical replicates) the files will be merged after alignment but before any analysis.

Both examples miss still the *_I1_001.fastq.gz files. If these are more replicates, then add them accordingly. However, if they are input controls then use the --with_control flag for running the pipeline and your samplesheet would rather look like

sample,fastq_1,fastq_2,replicate,control,control_replicate
12_B9,12_B9_KOL4052A19_S2_L001_R1_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R1_001.fastq.gz,1,12_B9_INPUT_CTRL,1
12_B9,12_B9_KOL4052A19_S2_L001_R2_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R2_001.fastq.gz,2,12_B9_INPUT_CTRL,1
12_B9,12_B9_KOL4052A19_S2_L001_R3_001.fastq.gz,12_B9_KOL4052A19_S2_L002_R3_001.fastq.gz,3,12_B9_INPUT_CTRL,1
12_B9_INPUT_CTRL,12_B9_KOL4052A19_S2_L001_I1_001.fastq.gz,12_B9_KOL4052A19_S2_L002_I1_001.fastq.gz,1,,
13_b10,13_B10_KOL4052A20_S4_L001_R1_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R1_001.fastq.gz,1,13_b10_INPUT_CTRL,1
13_b10,13_B10_KOL4052A20_S4_L001_R2_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R2_001.fastq.gz,2,13_b10_INPUT_CTRL,1
13_b10,13_B10_KOL4052A20_S4_L001_R3_001.fastq.gz,13_B10_KOL4052A20_S4_L002_R3_001.fastq.gz,3,13_b10_INPUT_CTRL,1
13_b10_INPUT_CTRL,13_B10_KOL4052A20_S4_L001_I1_001.fastq.gz,13_B10_KOL4052A20_S4_L002_I1_001.fastq.gz,1,,

As this issue had not more activity I will close it now. Feel free to reach us if you have any further questions @pseudacriscrucifer