Details of Hi-C data mapping
yangfangyuan0102 opened this issue · comments
Hi, dear author,
Are there any necessary technical details if I don't plan to follow ArimaGenomics/mapping_pipeline? That's not neat, comparing to requirements of other scaffolding programs. I hope to output a usable BAM myself using bwa and samtools.
Thanks
Best
Hello @yangfangyuan0102,
We also tried using BWA mem with -5SP
options to map R1 and R2 reads together, and then samtools fixmate
to fill in mate information followed by sorting by coordinates and finally marking duplicates. The bwa mem
part is quite similar to the omni-c mapping.
Best,
Chenxi
For samtools fixmate
, we used -mp
options. Chenxi
bwa-mem2 index $genome
bwa-mem2 mem -SP5 -t $cpu $genome $read1 $read2 | samtools view -@ 5 -b - | samtools fixmate -mp -@ 5 - - | samtools sort -m 2g -@ 5 - | samtools markdup -@ 5 -r - alignment.bam
Sorry, just to help me understand: I am also using the arima mapping pipeline, but found it really slow and cumbersome (we have a 40 Gbp genome of a tree). Is the above code
bwa-mem2 index $genome
bwa-mem2 mem -SP5 -t $cpu $genome $read1 $read2 | samtools view -@ 5 -b - | samtools fixmate -mp -@ 5 - - | samtools sort -m 2g -@ 5 - | samtools markdup -@ 5 -r - alignment.bam
all that is necessary to replace the arima pipeline?
Follow up: Nope, unfortunately results seem worse than with the full arima pipeline...