Mismatching allele code
ys8358 opened this issue · comments
Hello,
We have two markers with a problem when we predicted PRS:
22:38583315:AAAAG:AAAAGAAAG
22:38583315:AAAAG:AAAAGAAAGAAAG
plink2a_dev_20190429 --pfile testOut --score test.score.txt list-variants cols=-scoreavgs,+scoresums --out PRS.test
Got an warning that “2 --score file entry was skipped due to a mismatching allele code”.
Traced back the issue, it appears be to at step of extract markers from pgen file.
Snp.ids file contains two markers:
22:38583315:AAAAG:AAAAGAAAG
22:38583315:AAAAG:AAAAGAAAGAAAG
-
plink2a_dev_20201028 --pfile chr22.03.dose --extract snp.ids --export vcf vcf-dosage=DS --out testOut
testOut.vcf file
#CHROM POS ID REF ALT
22 38583315 22:38583315:AAAAG:AAAAGAAAG A AAAAG
22 38583315 22:38583315:AAAAG:AAAAGAAAGAAAG A AAAAGAAAG -
plink2a_dev_20190429 --vcf testOut.vcf dosage=DS --make-pgen --out testOut
testOut.pvar file:
#CHROM POS ID REF ALT
22 38583315 22:38583315:AAAAG:AAAAGAAAG A AAAAG
22 38583315 22:38583315:AAAAG:AAAAGAAAGAAAG A AAAAGAAAG -
edit testOut.pvar file as following
#CHROM POS ID REF ALT
22 38583315 22:38583315:AAAAG:AAAAGAAAG AAAAG AAAAGAAAG
22 38583315 22:38583315:AAAAG:AAAAGAAAGAAAG AAAAG AAAAGAAAGAAAG -
run PRS prediction again with new pvar file, the program run successfully.
Not plink2's problem; plink2 worked as designed. You are responsible for keeping allele representation consistent across your files.
In the input file chr22.03.dose, the allele information are correct.
After subsetting with PLINK --extract option, the allele information in the VCF file generated by PLINK are not correct. I can manually edit .pvar file, but the best way to resolve the issue is to fix the PLINK at --extract option.
Best,
Yunling
Nope, it's because you provided incorrect input. You are responsible for ensuring your files follow the same convention when it comes to variant normalization (see e.g. https://genome.sph.umich.edu/wiki/Variant_Normalization ).