c-zhou / yahs

Yet another Hi-C scaffolding tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Details of Hi-C data mapping

yangfangyuan0102 opened this issue · comments

Hi, dear author,
Are there any necessary technical details if I don't plan to follow ArimaGenomics/mapping_pipeline? That's not neat, comparing to requirements of other scaffolding programs. I hope to output a usable BAM myself using bwa and samtools.
Thanks

Best

Hello @yangfangyuan0102,

We also tried using BWA mem with -5SP options to map R1 and R2 reads together, and then samtools fixmate to fill in mate information followed by sorting by coordinates and finally marking duplicates. The bwa mem part is quite similar to the omni-c mapping.

Best,
Chenxi

For samtools fixmate, we used -mp options. Chenxi

bwa-mem2 index $genome
bwa-mem2 mem -SP5 -t $cpu $genome $read1 $read2 | samtools view -@ 5 -b - | samtools fixmate -mp -@ 5 - - | samtools sort -m 2g -@ 5 - | samtools markdup -@ 5 -r - alignment.bam

Sorry, just to help me understand: I am also using the arima mapping pipeline, but found it really slow and cumbersome (we have a 40 Gbp genome of a tree). Is the above code

bwa-mem2 index $genome
bwa-mem2 mem -SP5 -t $cpu $genome $read1 $read2 | samtools view -@ 5 -b - | samtools fixmate -mp -@ 5 - - | samtools sort -m 2g -@ 5 - | samtools markdup -@ 5 -r - alignment.bam

all that is necessary to replace the arima pipeline?

Follow up: Nope, unfortunately results seem worse than with the full arima pipeline...