aquaskyline / Skyhawk

An Artificial Neural Network-based discriminator for validating clinically significant genomic variants

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

haploid genotypes cause an error in dataPrepScripts/GetTruth.py

AndrewCarroll opened this issue · comments

Line 68-69 of dataPrepScripts/GetTruth.py

varType = last.split(":")[0].replace("/","|").replace(".","0").split("|")
p1, p2 = varType

Will fail if the variant site is haploid. It seems that this will cause Skyhawk to fail for certain variant callers (even if the majority of the sites are diploid and only a few haploid).

There are some good reasons that a variant caller might decide to not write a diploid call (chrX or chrY come to mind).

Here is a snippet of a Strelka2 VCF (HG001 on hg38) with a haploid call - on chromosome1 (maybe it thinks there is a deletion here?) that causes Skyhawk to fail. I'm not sure how you want to handle haploid sites - but I thought you would like to know as this does seem to limit Skyhawk to certain callers.

strelka.pass.1000.vcf.gz

fixed in 63e7e9d