alexdobin / STAR

RNA-seq aligner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Understanding variants SAM tags

Yenaled opened this issue · comments

I'm using STAR WASP to map variants in reads, and outputting the relevant SAM tags.

I'm having trouble interpreting them.
For example, I get the following:

vA:B:c,1,2 vG:B:i,16570789,16570893 vW:i:1

What does vA:B:c,1,2 mean?

Other times, I'll see stuff like vA:B:c,1,2,3,1 or vA:B:c,4,1,1. What do those mean?

I am using a VCF file containing two strains of interest (placed the last two columns of the VCF file). But which of the vA:B: output is the ref allele? Which one is the first strain allele? Which one is the second strain allele?

OK, I understand now. Look at the GT in the VCF file. The GT in the VCF is X/Y and therefore in vA tag, 1=X while 2=Y (and the rest are for special cases) -- it seems vA is agnostic to what is REF or ALT (which is good, since we might want to map against two non-REF variants). If there are multiple strains in the VCF file, only the GTs in the first column are considered.

I've answered this on biostars (and let me know if there are any problems with my answer). Closing this for now.

https://www.biostars.org/p/9592236/