chrchang / plink-ng

A comprehensive update to the PLINK association analysis toolset. Beta testing of the first new version (1.90), focused on speed and memory efficiency improvements, is finishing up. Development is now focused on building out support for multiallelic, phased, and dosage data in PLINK 2.0.

Home Page:https://www.cog-genomics.org/plink/2.0/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Correctness in Ref/Alt when converting to VCF

weichisyu opened this issue · comments

commented

Hi,

I tried to use plink2 to convert SNP array data to vcf.
The correctness of the Ref/Alt allele is important for me due to the later database annotation.
I used the --ref-from-fa and --fa arguments to help the Ref/Alt allele correction. And I found that when the variants only have one allele in SNP array data, even if it's an Alt allele plink2 would not switch it to the Ref allele.

e.g.
rs187071114 (G>A,C,T)
SNP Array data: all samples are A/A
VCF(convert from plink2) : Ref/Alt = A/. ; GT=0/0

Following the previous example. Is it possible for plink2 to convert this variant into G/A, 1/1?

This is my command:

plink2 --tfam data.tfam --tped data.tped --ref-from-fa --fa /resource/gatk_bundle_ref/Homo_sapiens_assembly38.fasta --recode vcf-iid bgz --out array2vcf

The version of plink2
PLINK v2.00a4LM AVX2 Intel (26 Apr 2023)

Thanks

This cannot be done safely if there are any indels in your dataset.

I will look into adding an option that handles the SNP-only case, and errors out if there is a single non-SNP.

commented

Thank you for your reply.
It means I should find a way to separate indels and SNP first right?

P.S. an experiment:
I manually edited my tped file to let one sample with a genotype of G/A in this variant. After running plink2(with --ref-from-fa and --fa), the vcf shows Ref/Alt=G/A in this variant. The sample I edited has a genotype of 0/1, while the other samples have a genotype of 1/1.

…and if you are able to do that properly, you should just use —ref-allele instead, since that is not inherently unsafe in the way that an augmented —ref-from-fa would have to be.

commented

Thanks for your advice.
I extracted the variant ID and corresponding Ref allele from an array annotation CSV file provided by the supplier. I then used the '--ref-allele' argument to pass this file to Plink2. The preliminary results appear to be fine, but now I am reviewing them closely to verify the accuracy of the Ref/Alt alleles.