luntergroup / octopus

Bayesian haplotype-based mutation calling

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

--refcall POSITIONAL reports * instead of . in ALT column for reference call creating triallelic calls downstream

Kelzor opened this issue · comments

Describe the bug

When I merge vcfs generated using --refcall POSITIONAL and -P 2, I get triallelic genotype calls (2|0, 0|2) in the resulting merged vcf file that are not present in the individual vcf files. I think is has to do with how bcftools treats "*" in the ALT column. Is there a simple way to change the * to . so that bcftools identifies the ALT as a reference call and not an alternative allele?

For example, in the single .vcf, this is the result:

MTB_anc 1143 . G A
1|0:96:21:37:1104:100:15,6:0.714,0.286:0.214,0.214:0.363:41,32:21:PASS

In the merged .vcf, this is the result for the same sample (first) and the other two that have been merged, which are both homozygous reference:

MTB_anc 1143 . G *,A
2|0:96:21:37:15,.,6:0.714,.,0.286:0.214,.,0.214:0.363:41,.,32:21:PASS:1104:100
0|0:177:59:37:59,.,.:1,.,.:0,.,.:.:41,.,.:59:PASS:.:.
0|0:75:25:37:25,.,.:1,.,.:0,.,.:.:41,.,.:25:PASS:.:.

Attached is a screenshot of the same three samples in IGV for that position. You can see the calls should be 0|1, 0|0, 0|0

This will become a problem for me at sites that are actually triallelic. Thank you!

Version

$ octopus --version
octopus version 0.7.4
Target: x86_64 Linux 5.10.25-linuxkit
SIMD extension: AVX2
Compiler: GNU 11.1.0
Boost: 1_76

Command
Command line to install octopus:

$ singularity build octopus.sif docker://dancooke/octopus

Command line to run octopus:

$ singularity run -B /data/stonelab:/data/stonelab /home/keblevin/octopus.sif \
--reference /data/stonelab/references/MTB_ancestor/MTB_ancestor.fasta \
-I /data/stonelab/Kelly_TB/BAMS_BAMS_BAMS/Vagene_et_al_bams/Colombia_1321_4402.bam \
-P 2 \
--refcall POSITIONAL \
--annotations \
--threads \
--output "19-4-23-Col1321-refcall-2P-annotations.vcf"

Screenshot 2023-04-20 at 12 36 50