cumc / xqtl-protocol

Molecular QTL analysis protocol developed by ADSP Functional Genomics Consortium

Home Page:https://cumc.github.io/xqtl-protocol/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update on susie break the code

hsun3163 opened this issue · comments

The merging of genoFile and phenoFile and many subsequent operation depends on the colnames. Setting header = 0 will produce error. WIll add fix header to pd.read_csv when have time

genoFile = pd.read_csv(genoFile, sep = "\t", header=0)

if len(phenoFile) != len(covFile):
    raise ValueError("Number of input phenotypes files must match that of covariates files")
if len(phenoFile) != len(phenotype_names):
    raise ValueError("Number of input phenotypes files must match the number of phenotype names")
## pos and covar are condition specific, this way when there is no phenotype file, there is na in the corresponding column.
phenoFile = [pd.read_csv(x, sep = "\t", header=0).assign(pos = lambda y:y['#chr']+':'+y['start'].astype("str")+'-'+
                                              y['end'].astype("str")).assign(cov_path = z, cond = a ).drop(columns = ["#chr","start","end"]).rename(columns = {"ID":"#id"})   
             for x,z,a in zip(phenoFile,covFile,phenotype_names)]
for i in range(len(phenoFile)):
    genoFile = genoFile.merge(phenoFile[i], on='#id', how='left', suffixes = (f'{i}_x', f'{i}_y'))

Mistaken on the behavior of pd.read_csv, i thought header = 0 is the same as header = False

This commit still break thing, so I will reopened the ticket
cbbc77a

Fixed.