cumc / xqtl-protocol

Molecular QTL analysis protocol developed by ADSP Functional Genomics Consortium

Home Page:https://cumc.github.io/xqtl-protocol/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sumstat standardizer fail to handle reverse strand

hsun3163 opened this issue · comments

In the snp_match_dup() function, the snp and A0 A1 are handle using this line, which cannot handle the snp reverse issue.

new_query = pd.concat([new_subject.iloc[:,:5],query.loc[pm.qidx].iloc[:,5:]],axis=1)

Annotation 2023-01-11 151255

Also the flip should be "TRUE", it is in the allele_flip_qc function()


Actually the flipping is not a issue, id of the snp is misleading but the actual ref and alt are correct.
image

This is due to the order of A0 and A1 in the index are defined by alphabetical order, as shown below. This was kept for getting some other function running. After the dependency are resolved this is to be changed.

def namebyordA0_A1(df,cols=['CHR','POS','A0','A1']):
    df.columns = cols
    prefix = df[[x for x in cols if x not in ['CHR','POS','A0','A1']]+['CHR','POS']].astype(str).agg(':'.join, axis=1)
    names = []
    for p,A0,A1 in zip(prefix,df.A0,df.A1):
        tmp = A0+':'+A1 if A0 > A1 else A1 +':'+ A0
        names.append('_'.join([p,tmp]))
    return names

A quick fix to the issue is
Capture1
, but need to see if there is any residual impact.

doesnt seem to be any residual impact, close for now.