The following instructions describe the pipeline used to process Arabidopsis CAP-C libraries in the manuscript.
The following software are required to run the Bash pipeline
- pigz
- SeqPrep
- cutadapt
- python 3.7
- HiC-Pro (>=3.0.0)
- 01-CAPC
- CAPC_R1.fq.gz
- CAPC_R2.fq.gz
- 02-cutadapt
- CAPC_merged_001_T1.fastq.gz
- CAPC_merged_001_T2.fastq.gz
- CAPC_unmerged_001_A1.fastq.gz
- CAPC_unmerged_001_G1.fastq.gz
- CAPC_unmerged_001_A2.fastq.gz
- CAPC_unmerged_001_G2.fastq.gz
- ......
- 03-after_filter
- CAPC*{OK,Short}_R1*.fastq.gz
- CAPC*{OK,Short}_R2*.fastq.gz
- 04-fastq
- CAPC/CAPC_merged_R1.fastq.gz
- CAPC/CAPC_merged_R2.fastq.gz
- 20-split
- CAPC_R1_001.fastq.gz
- CAPC_R1_002.fastq.gz
- ......
- 21-merge
- CAPC_merged_001.fastq.gz
- CAPC_unmerged_R1_001.fastq.gz
- CAPC_unmerged_R2_001.fastq.gz
- ......
When the Bash Script finished. You can remove folder 02-cutadapt
, 03-after_filter
, 20-split
, 21-merge
The clean fastq was storage in 04-fastq
.
This bash script is used to clean the raw sequencing fastq, then you can use the clean fastq to mapping.
Since the software SeqPrep used in the third step generally only performs single-threaded calculations, the original FASTQ files should be split into smaller FASTQ files first.
In order to reduce storage usage, compress the split FASTQ files. We used pigz, which can perform multi-threaded compression on sequencing files, speeding up data processing.
Use SeqPrep to merge the FASTQ reads.
Remove the linker sequences from the reads and regenerate the FASTQ files.
Filter the regenerated FASTQ files.
Reassemble the filtered FASTQ files into clean FASTQ files for the next step of mapping.
Now we have fastq data without linkers. Then we use HiC-Pro
to mapping. We provide our HiC-Pro config file CAPC_Arabidopsis_config-hicpro.txt
, and use the command hicpro -i 04-fastq -o 05-rmdup_rmmulti_Seperate_HiCPro_result -c Arabidopsis_DNase_config-hicpro.txt
.