cumc / xqtl-protocol

Molecular QTL analysis protocol developed by ADSP Functional Genomics Consortium

Home Page:https://cumc.github.io/xqtl-protocol/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Large amount of snps discarded by sumstat standardizer for ADGWAS

hsun3163 opened this issue · comments

From the stderr:

/home/hs3163/miniconda3/lib/python3.9/site-packages/cugg/utils.py:27: UserWarning: There are SNPs 960: REF:ALT = ALT:REF. They will be removed.
  warnings.warn("There are SNPs {}: REF:ALT = ALT:REF. They will be removed.".format(sum(indels)))

In the stdout:

/mnt/vast/hpc/csg/xqtl_workflow_testing/ADGWAS/data_intergration/ADGWAS2022/ADGWAS_Bellenguez_2022.1/ADGWAS_Bellenguez_2022.1.yml False False
Total number of sumstats:  1
{'/mnt/vast/hpc/csg/xqtl_workflow_testing/ADGWAS/ADGWAS2022.chr1.sumstat.tsv': {'ID': 'CHR,POS,A0,A1', 'CHR': 'chromosome', 'POS': 'base_pair_location', 'A0': 'other_allele', 'A1': 'effect_allele', 'STAT': 'beta', 'SE': 'standard_error', 'P': 'p_value', 'maf': 'maf', 'n_cases': 'n_cases', 'n_controls': 'n_controls', 'original_effect_allele_frequency': 'effect_allele_frequency'}}
Total rows of query:  1682696 Total rows of subject:  1683176
All are done

However, in the actual output, there are only:

hs3163@node58:/mnt/vast/hpc/csg/xqtl_workflow_testing/ADGWAS/data_intergration/ADGWAS2022/ADGWAS_Bellenguez_2022.1$ wc ADGWAS2022.chr1.sumstat.tsv
  811298  9735576 69556956 ADGWAS2022.chr1.sumstat.tsv

It is a priority to solve this.

This error is due to an inherent problem in CUGG's check_snp function, where not all the snps will be compared. After the new fix this problem no longer existed. The corrected sumstat should be:

Total rows of query:  1682696 Total rows of subject:  1683176
Total exact:  1014648 Total flip:  783441 Total reverse:  230816 Total ambiguous:  232182
Total remaining:  1451880
All are done

This error should not impact the QTL results which took a separate route.