Call in high heterozygotes

Question

Call in high heterozygotes

Axze-rgb opened this issue 9 months ago · comments

Hello,

usually when we think about mapping reads and calling, we have in mind the human genome. But I am dealing with an organism that has between 2 and 3 heterozygosity, even more at the telomeres. This causes an issue: since illumina reads map with a lot of mismatches, callers tend to ditch a lot of what we think are valid SNPs (because we have limited long reads data supporting that fact). They get ditch because either the reads maps with a relatively poor score due to the many mismatches, or the SNP are too close to one another. I see this with GATK, for example. GATK still works for giving big trends, but I am interested in developing a more refined analysis.

Is there a way with octopus to deal with high heterozygosity? So, basically the 2 issues to deal with:

reads mapping with much more mismatches than in the case of the human genome
SNPs being physically close to another (high density).

I don't understand Octopus well enough to know what parameters to tweak. I will happily try any suggestion.