Question on homologous regions

Question

Question on homologous regions

vladsavelyev opened this issue 7 years ago · comments

Hi,

I ran BWA, LongRanger, and EMA for NA12878 WGS dataset from https://support.10xgenomics.com/genome-exome/datasets/2.1.4/NA12878_WGS_v2, and looking at the challenging regions listed in your notebook:

C4A 6:31965242

AMY1 1:104197843

CYP2D7 22:42537120

It looks like EMA has a higher coverage in those regions, however all those extra alignments are secondary (shaded on the screenshot). I'm wondering if that's an expected picture for EMA? And those secondary alignments could not be resolved via the linked read information? As far as I understand, those secondary alignments will be ignored by variant callers. They look to add up noise in variation, as in this CYP2D6 screenshot (more vertical colored lines at the coverage tracks):

Please, correct me if I'm wrong. I see you evaluated those regions with a more sophisticated strategy by checking against the NA12878 assemblies, so probably I'm not looking at these regions the right way.

A. R. Shajii · Answer 1 · Thu Mar 22 2018 00:12:23 GMT+0800 (China Standard Time)

I believe those are actually low-MAPQ alignments (EMA doesn't output separate secondary alignments right now, aside from the XA tag which AFAIK doesn't show up in IGV). I was actually planning on refining how MAPQs are assigned to reads in these homologous regions, which should fix this.

Vlad Savelyev · Answer 2 · Thu Mar 22 2018 07:42:44 GMT+0800 (China Standard Time)

That's right, those are primary alignments, sorry for the confusion (though most of them have better matches somewhere else, but I guess we can trust them since you proved they are supported by the assemblies). That's a good news then; looking forward to refined MAPQ.