chrchang / plink-ng

A comprehensive update to the PLINK association analysis toolset. Beta testing of the first new version (1.90), focused on speed and memory efficiency improvements, is finishing up. Development is now focused on building out support for multiallelic, phased, and dosage data in PLINK 2.0.

Home Page:https://www.cog-genomics.org/plink/2.0/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mismatching allele code

ys8358 opened this issue · comments

Hello,
We have two markers with a problem when we predicted PRS:
22:38583315:AAAAG:AAAAGAAAG
22:38583315:AAAAG:AAAAGAAAGAAAG

plink2a_dev_20190429 --pfile testOut --score test.score.txt list-variants cols=-scoreavgs,+scoresums --out PRS.test
Got an warning that “2 --score file entry was skipped due to a mismatching allele code”.

Traced back the issue, it appears be to at step of extract markers from pgen file.

Snp.ids file contains two markers:
22:38583315:AAAAG:AAAAGAAAG
22:38583315:AAAAG:AAAAGAAAGAAAG

  1. plink2a_dev_20201028 --pfile chr22.03.dose --extract snp.ids --export vcf vcf-dosage=DS --out testOut
    testOut.vcf file
    #CHROM POS ID REF ALT
    22 38583315 22:38583315:AAAAG:AAAAGAAAG A AAAAG
    22 38583315 22:38583315:AAAAG:AAAAGAAAGAAAG A AAAAGAAAG

  2. plink2a_dev_20190429 --vcf testOut.vcf dosage=DS --make-pgen --out testOut
    testOut.pvar file:
    #CHROM POS ID REF ALT
    22 38583315 22:38583315:AAAAG:AAAAGAAAG A AAAAG
    22 38583315 22:38583315:AAAAG:AAAAGAAAGAAAG A AAAAGAAAG

  3. edit testOut.pvar file as following
    #CHROM POS ID REF ALT
    22 38583315 22:38583315:AAAAG:AAAAGAAAG AAAAG AAAAGAAAG
    22 38583315 22:38583315:AAAAG:AAAAGAAAGAAAG AAAAG AAAAGAAAGAAAG

  4. run PRS prediction again with new pvar file, the program run successfully.

Not plink2's problem; plink2 worked as designed. You are responsible for keeping allele representation consistent across your files.

In the input file chr22.03.dose, the allele information are correct.

After subsetting with PLINK --extract option, the allele information in the VCF file generated by PLINK are not correct. I can manually edit .pvar file, but the best way to resolve the issue is to fix the PLINK at --extract option.

Best,

Yunling

Nope, it's because you provided incorrect input. You are responsible for ensuring your files follow the same convention when it comes to variant normalization (see e.g. https://genome.sph.umich.edu/wiki/Variant_Normalization ).