Update on susie break the code

Question

Update on susie break the code

hsun3163 opened this issue 8 months ago · comments

The merging of genoFile and phenoFile and many subsequent operation depends on the colnames. Setting header = 0 will produce error. WIll add fix header to pd.read_csv when have time

genoFile = pd.read_csv(genoFile, sep = "\t", header=0)

if len(phenoFile) != len(covFile):
    raise ValueError("Number of input phenotypes files must match that of covariates files")
if len(phenoFile) != len(phenotype_names):
    raise ValueError("Number of input phenotypes files must match the number of phenotype names")
## pos and covar are condition specific, this way when there is no phenotype file, there is na in the corresponding column.
phenoFile = [pd.read_csv(x, sep = "\t", header=0).assign(pos = lambda y:y['#chr']+':'+y['start'].astype("str")+'-'+
                                              y['end'].astype("str")).assign(cov_path = z, cond = a ).drop(columns = ["#chr","start","end"]).rename(columns = {"ID":"#id"})   
             for x,z,a in zip(phenoFile,covFile,phenotype_names)]
for i in range(len(phenoFile)):
    genoFile = genoFile.merge(phenoFile[i], on='#id', how='left', suffixes = (f'{i}_x', f'{i}_y'))

hsun3163 commented 8 months ago

Fixed.

hsun3163 · Answer 1 · Fri Nov 10 2023 09:41:25 GMT+0800 (China Standard Time)

Mistaken on the behavior of pd.read_csv, i thought header = 0 is the same as header = False

hsun3163 · Answer 2 · Fri Nov 10 2023 10:42:22 GMT+0800 (China Standard Time)

This commit still break thing, so I will reopened the ticket
cbbc77a