Large amount of snps discarded by sumstat standardizer for ADGWAS
hsun3163 opened this issue · comments
From the stderr:
/home/hs3163/miniconda3/lib/python3.9/site-packages/cugg/utils.py:27: UserWarning: There are SNPs 960: REF:ALT = ALT:REF. They will be removed.
warnings.warn("There are SNPs {}: REF:ALT = ALT:REF. They will be removed.".format(sum(indels)))
In the stdout:
/mnt/vast/hpc/csg/xqtl_workflow_testing/ADGWAS/data_intergration/ADGWAS2022/ADGWAS_Bellenguez_2022.1/ADGWAS_Bellenguez_2022.1.yml False False
Total number of sumstats: 1
{'/mnt/vast/hpc/csg/xqtl_workflow_testing/ADGWAS/ADGWAS2022.chr1.sumstat.tsv': {'ID': 'CHR,POS,A0,A1', 'CHR': 'chromosome', 'POS': 'base_pair_location', 'A0': 'other_allele', 'A1': 'effect_allele', 'STAT': 'beta', 'SE': 'standard_error', 'P': 'p_value', 'maf': 'maf', 'n_cases': 'n_cases', 'n_controls': 'n_controls', 'original_effect_allele_frequency': 'effect_allele_frequency'}}
Total rows of query: 1682696 Total rows of subject: 1683176
All are done
However, in the actual output, there are only:
hs3163@node58:/mnt/vast/hpc/csg/xqtl_workflow_testing/ADGWAS/data_intergration/ADGWAS2022/ADGWAS_Bellenguez_2022.1$ wc ADGWAS2022.chr1.sumstat.tsv
811298 9735576 69556956 ADGWAS2022.chr1.sumstat.tsv
It is a priority to solve this.
This error is due to an inherent problem in CUGG's check_snp function, where not all the snps will be compared. After the new fix this problem no longer existed. The corrected sumstat should be:
Total rows of query: 1682696 Total rows of subject: 1683176
Total exact: 1014648 Total flip: 783441 Total reverse: 230816 Total ambiguous: 232182
Total remaining: 1451880
All are done
This error should not impact the QTL results which took a separate route.