Update on susie break the code
hsun3163 opened this issue · comments
The merging of genoFile and phenoFile and many subsequent operation depends on the colnames. Setting header = 0 will produce error. WIll add fix header to pd.read_csv when have time
genoFile = pd.read_csv(genoFile, sep = "\t", header=0)
if len(phenoFile) != len(covFile):
raise ValueError("Number of input phenotypes files must match that of covariates files")
if len(phenoFile) != len(phenotype_names):
raise ValueError("Number of input phenotypes files must match the number of phenotype names")
## pos and covar are condition specific, this way when there is no phenotype file, there is na in the corresponding column.
phenoFile = [pd.read_csv(x, sep = "\t", header=0).assign(pos = lambda y:y['#chr']+':'+y['start'].astype("str")+'-'+
y['end'].astype("str")).assign(cov_path = z, cond = a ).drop(columns = ["#chr","start","end"]).rename(columns = {"ID":"#id"})
for x,z,a in zip(phenoFile,covFile,phenotype_names)]
for i in range(len(phenoFile)):
genoFile = genoFile.merge(phenoFile[i], on='#id', how='left', suffixes = (f'{i}_x', f'{i}_y'))
Mistaken on the behavior of pd.read_csv, i thought header = 0 is the same as header = False
Fixed.