Details of Hi-C data mapping

Question

Details of Hi-C data mapping

yangfangyuan0102 opened this issue 2 years ago · comments

Hi, dear author,
Are there any necessary technical details if I don't plan to follow ArimaGenomics/mapping_pipeline? That's not neat, comparing to requirements of other scaffolding programs. I hope to output a usable BAM myself using bwa and samtools.
Thanks

Best

Chenxi Zhou · Answer 1 · Thu Dec 08 2022 05:04:22 GMT+0800 (China Standard Time)

Hello @yangfangyuan0102,

We also tried using BWA mem with -5SP options to map R1 and R2 reads together, and then samtools fixmate to fill in mate information followed by sorting by coordinates and finally marking duplicates. The bwa mem part is quite similar to the omni-c mapping.

Best,
Chenxi

Chenxi Zhou · Answer 2 · Thu Dec 08 2022 05:14:18 GMT+0800 (China Standard Time)

For samtools fixmate, we used -mp options. Chenxi

yangfangyuan0102 · Answer 3 · Mon Jan 23 2023 13:28:43 GMT+0800 (China Standard Time)

bwa-mem2 index $genome
bwa-mem2 mem -SP5 -t $cpu $genome $read1 $read2 | samtools view -@ 5 -b - | samtools fixmate -mp -@ 5 - - | samtools sort -m 2g -@ 5 - | samtools markdup -@ 5 -r - alignment.bam

Laura Uelze · Answer 4 · Thu Dec 14 2023 22:18:33 GMT+0800 (China Standard Time)

Sorry, just to help me understand: I am also using the arima mapping pipeline, but found it really slow and cumbersome (we have a 40 Gbp genome of a tree). Is the above code

bwa-mem2 index $genome
bwa-mem2 mem -SP5 -t $cpu $genome $read1 $read2 | samtools view -@ 5 -b - | samtools fixmate -mp -@ 5 - - | samtools sort -m 2g -@ 5 - | samtools markdup -@ 5 -r - alignment.bam

all that is necessary to replace the arima pipeline?

Follow up: Nope, unfortunately results seem worse than with the full arima pipeline...