The INDEL/TR variant rs746071566 was not called by Bcftools v1.20 with --indels-cns in mpileup
davidyuyuan opened this issue · comments
The rs746071566 in NUDT15 is at 13:48037783 as shown in EnsEMBL http://useast.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=13:48037283-48038301;v=rs746071566;vdb=variation;vf=816843132.
Bcftools called the variant at 48037782 for the G1K sample NA18945, off by 1. Please note that NUDT15 is on the template strand. It is not related to HGVS 3'rule.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA18945
chr13 48037782 . A AGGAGTC 222.386 . INDEL;IDV=14;IMF=0.368421;DP=38;VDB=0.0238714;SGB=-0.686358;RPBZ=-2.52849;MQBZ=0;MQSBZ=0;BQBZ=-4.63681;SCBZ=-0.763763;MQ0F=0;AC=1;AN=2;DP4=13,11,9,5;MQ=60 GT:PL 0/1:255,0,189
I used the following code invoking the new option --indels-cns. Perhaps, I picked some wrong options in the steps of call
and norm
, which might not work well with the new --indels-cns
?
"${HOME}/bcftools/bcftools" mpileup --indels-cns -B -r "${regions}" -Ou -f "${ref_genome}" "${bam_file}" | \
"${HOME}/bcftools/bcftools" call --ploidy "${ploidy}" -Ou -mv | \
"${HOME}/bcftools/bcftools" norm -a -m +any -f "${ref_genome}" -Oz -o "${output_dir}/${file}.unphased.vcf.gz" --write-index
Indel representation can ambiguous. The consensus is to use the most left-aligned, parsimonious representation. Please more on the topic here https://genome.sph.umich.edu/wiki/Variant_Normalization
If you try to normalize the variants you link to in ENSEMBL, you'll see it is not left-aligned
chr13 48037783 . GGAGTCG G,GGAGTCGGAGTCG,GGAGTCGGAGTCGGAGTCG . . .
becomes
chr13 48037782 . AGGAGTC A,AGGAGTCGGAGTC,AGGAGTCGGAGTCGGAGTC . . .
Thanks, Petr. I missed that. I need to normalize the variants that I got from VEP. Which option(s) did you use? "${HOME}/bcftools/bcftools" norm -m +any?
In the case above it was possible to normalize the entire multiallelic record. That may not be always the case though; one of the alleles can hinder left-alignment of the others.