samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html

Home Page:http://samtools.github.io/bcftools/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The INDEL/TR variant rs746071566 was not called by Bcftools v1.20 with --indels-cns in mpileup

davidyuyuan opened this issue · comments

The rs746071566 in NUDT15 is at 13:48037783 as shown in EnsEMBL http://useast.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=13:48037283-48038301;v=rs746071566;vdb=variation;vf=816843132.

Bcftools called the variant at 48037782 for the G1K sample NA18945, off by 1. Please note that NUDT15 is on the template strand. It is not related to HGVS 3'rule.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA18945
chr13   48037782        .       A       AGGAGTC 222.386 .       INDEL;IDV=14;IMF=0.368421;DP=38;VDB=0.0238714;SGB=-0.686358;RPBZ=-2.52849;MQBZ=0;MQSBZ=0;BQBZ=-4.63681;SCBZ=-0.763763;MQ0F=0;AC=1;AN=2;DP4=13,11,9,5;MQ=60  GT:PL   0/1:255,0,189

I used the following code invoking the new option --indels-cns. Perhaps, I picked some wrong options in the steps of call and norm, which might not work well with the new --indels-cns?

  "${HOME}/bcftools/bcftools" mpileup --indels-cns -B -r "${regions}" -Ou -f "${ref_genome}" "${bam_file}" | \
    "${HOME}/bcftools/bcftools" call --ploidy "${ploidy}" -Ou -mv | \
    "${HOME}/bcftools/bcftools" norm -a -m +any -f "${ref_genome}" -Oz -o "${output_dir}/${file}.unphased.vcf.gz" --write-index

Indel representation can ambiguous. The consensus is to use the most left-aligned, parsimonious representation. Please more on the topic here https://genome.sph.umich.edu/wiki/Variant_Normalization

If you try to normalize the variants you link to in ENSEMBL, you'll see it is not left-aligned

chr13	48037783	.	GGAGTCG	G,GGAGTCGGAGTCG,GGAGTCGGAGTCGGAGTCG	.	.	.

becomes

chr13	48037782	.	AGGAGTC	A,AGGAGTCGGAGTC,AGGAGTCGGAGTCGGAGTC	.	.	.

Thanks, Petr. I missed that. I need to normalize the variants that I got from VEP. Which option(s) did you use? "${HOME}/bcftools/bcftools" norm -m +any?

In the case above it was possible to normalize the entire multiallelic record. That may not be always the case though; one of the alleles can hinder left-alignment of the others.