Incorrect mate start with supplementary alignments
adthrasher opened this issue · comments
Hello,
I have encountered an issue where STAR is pairing read records incorrectly. In the following set of reads, the 4th record is read 1 and is unaligned. It records its mate as having a position of chr2:32916431
. However, that is the start position of one of the supplementary alignments for read 2. It should point to the primary alignment (chr2:32916428
). I didn't see an issue for this and I didn't find anything in the documentation to explain it. I am using STAR 2.7.11b. I've attached a zip with the two reads and the command I'm invoking. I am aligning to GRCh38.
A00466:235:HCNJGDRX2:1:2104:19678:5822 137 chr2 32916428 1 126M * 0 0 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGCGGGGGCGGGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG FFFFFFFFFFFFFFFFFFFFFF:FF,:FF::,:,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,:,::,,,,,:,::,,,,F,,,:,,,,,:,,,,,,,,:,:F,,:,:,:,:::FF,:F NH:i:3 HI:i:1 AS:i:110 nM:i:7 NM:i:7 MD:Z:13A26G4A5G3G1G0A67 RG:Z:c947640 SM:SJST033767_D1 LB:SJST033767_D1 PL:illumina PU:HCNJGDRX2.1 CN:STJUDE
A00466:235:HCNJGDRX2:1:2104:19678:5822 393 chr2 32916431 1 126M * 0 0 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGCGGGGGCGGGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG FFFFFFFFFFFFFFFFFFFFFF:FF,:FF::,:,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,:,::,,,,,:,::,,,,F,,,:,,,,,:,,,,,,,,:,:F,,:,:,:,:::FF,:F NH:i:3 HI:i:2 AS:i:110 nM:i:7 NM:i:7 MD:Z:10A29G1A2G5G3A1G68 RG:Z:c947640 SM:SJST033767_D1 LB:SJST033767_D1 PL:illumina PU:HCNJGDRX2.1 CN:STJUDE
A00466:235:HCNJGDRX2:1:2104:19678:5822 393 chr2 32916429 1 126M * 0 0 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGCGGGGGCGGGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG FFFFFFFFFFFFFFFFFFFFFF:FF,:FF::,:,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,:,::,,,,,:,::,,,,F,,,:,,,,,:,,,,,,,,:,:F,,:,:,:,:::FF,:F NH:i:3 HI:i:3 AS:i:110 nM:i:7 NM:i:7 MD:Z:12A27G3A0G5G3G1A68 RG:Z:c947640 SM:SJST033767_D1 LB:SJST033767_D1 PL:illumina PU:HCNJGDRX2.1 CN:STJUDE
A00466:235:HCNJGDRX2:1:2104:19678:5822 69 * 0 0 * chr2 32916431 0 GTCGGCGGGAGAGGCCGGGAGGGAGGAAGACGAACGGAA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:0 HI:i:0 AS:i:110 nM:i:7 uT:A:4 RG:Z:c947640 SM:SJST033767_D1 LB:SJST033767_D1 PL:illumina PU:HCNJGDRX2.1 CN:STJUDE
Hi @adthrasher
In this case, there are three alignments for one of the mates, while the other mate is not mapped - so it's output only once and attached to one of the alignments.
If you want the unmapped mate to be output for each alignment of the mapped mate, use --outSAMunmapped Within KeepPairs
Yes, that is what I see. However, this is not SAM spec-compliant. The RNEXT
and PNEXT
fields for the unmapped read MUST be that of the primary alignment of the mate. Instead, STAR is pointing to one of the secondary alignments.
From https://samtools.github.io/hts-specs/SAMv1.pdf:
PNEXT: 1-based Position of the primary alignment of the NEXT read in the template. Set as 0 when
the information is unavailable. This field equals POS at the primary line of the next read. If PNEXT
is 0, no assumptions can be made on RNEXT and bit 0x20.