parklab / NGSCheckMate

Software program for checking sample matching for NGS data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

45 bases long SNPs

vladsavelyev opened this issue · comments

I tried to generate manually an *.ncm VAF file from an existing VCF file, so I can then feed it into vaf_ncm.py along with other VAF files. I wanted to use NGSCheckMate/SNP/SNP_GRCh37_hg19_wChr.bed as a baseline, however I noticed that some SNPs there are 45 bases long, in particular:

chr15   66994784        66994830        rs45536731      C       T
chr15   66995277        66995323        rs45549936      A       G
chr5    42565797        42565843        rs45550441      C       T
chr5    42719780        42719826        rs45458097      A       G

I'm worried if that affects normal work of the tool anyhow? E.g., when you I pass that BED file into ncm.py -B, will it call those SNPs correctly? And how can I know positions for the SNPs in .ncm VAF files?

Hi Valdsaveliev,

Thank you for finding this problem. We used the snp138CodingDbSnp bed file from the UCSC table browser. I found that those SNPs (rs45536731, rs45549936, rs45550441, rs45458097) are 45 bases long in the snp138CodingDbSnp bed file. Those four SNPs were deleted on Jun 15, 2015, due to mapping or clustering errors.

In this situation, output VCF files may have complete 45bp loci information when you use ncm.py -B option.
However, our evaluation algorithm selected only perfectly matched SNPs in the bed file (Mapping Key = chromosome + second location).
For example, the "chr15 66994784 66994830" case, our method choose only chr15_66994830 SNP for the evaluation.

We will change those four SNPs like this,
chr15 66994829 66994830 rs12521020 C T
chr15 66995322 66995323 rs4776822 A G
chr5 42565842 42565843 rs12521020 C T
chr5 42719825 42719826 rs2910875 A G

And the performance results will be the same as now.

Thank you,
Best regards,
Sejoon Lee.