TensorQTL Cis not able to handle multiple cis windows in the regionlist covering the same position.
hsun3163 opened this issue · comments
hsun3163 commented
After the following codes are removed.
phenotype_pos_df = phenotype_pos_df.assign(deviate_ratio = abs(0.5- (phenotype_pos_df.pos-phenotype_pos_df.start)/(phenotype_pos_df.end-phenotype_pos_df.start)) ## Deviate_ratio measure how far away are the pos deviate from the middle of the customized cis windows. \n",
" ).sort_values(by = [phenotype_id,\"deviate_ratio\"]).drop_duplicates(phenotype_id, keep='first').set_index(phenotype_id)[[\"chr\",\"start\",\"end\"]]
When a phenotype_id are covered by multiple extended cis windows, following error occurs:
cis-QTL mapping: nominal associations for all variant-phenotype pairs
* 389 samples
* 385 phenotypes
* 36 covariates
* 1324895 variants
* applying in-sample 0.006426735218508998 MAF filter
* cis-window: ±0
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/opt/conda/envs/TensorQTL/lib/python3.10/site-packages/tensorqtl/cis.py", line 211, in map_nominal
igc = genotypeio.InputGeneratorCis(genotype_df, variant_df, phenotype_df, phenotype_pos_df, group_s=group_s, window=window)
File "/opt/conda/envs/TensorQTL/lib/python3.10/site-packages/tensorqtl/genotypeio.py", line 391, in __init__
assert (phenotype_df.index == phenotype_df.index.unique()).all()
File "/opt/conda/envs/TensorQTL/lib/python3.10/site-packages/pandas/core/ops/common.py", line 69, in new_method
return method(self, other)
File "/opt/conda/envs/TensorQTL/lib/python3.10/site-packages/pandas/core/arraylike.py", line 32, in __eq__ return self._cmp_method(other, operator.eq)
File "/opt/conda/envs/TensorQTL/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6042, in _cmp_method
raise ValueError("Lengths must match to compare")
ValueError: Lengths must match to compare
This is due to duplicate entries of phenotype_df after merged with the customized cis window, as indicated below.
>>> phenotype_df.index.unique
<bound method Index.unique of Index(['chr21_ENSG00000154654_O15394', 'chr21_ENSG00000142192_P05067',
'chr21_ENSG00000156261_P50990', 'chr21_ENSG00000159082_O43426',
'chr21_ENSG00000159131_P22102', 'chr21_ENSG00000205726_Q15811',
'chr21_ENSG00000160209_O00764', 'chr21_ENSG00000141959_P17858',
'chr21_ENSG00000160305_Q14689'],
dtype='object', name='ID')>
>>> phenotype_df.index.unique
<bound method Index.unique of Index(['chr21_ENSG00000154654_O15394', 'chr21_ENSG00000154654_O15394',
'chr21_ENSG00000154654_O15394', 'chr21_ENSG00000154654_O15394',
'chr21_ENSG00000142192_P05067', 'chr21_ENSG00000142192_P05067',
'chr21_ENSG00000142192_P05067', 'chr21_ENSG00000142192_P05067',
'chr21_ENSG00000142192_P05067', 'chr21_ENSG00000142192_P05067',
...
'chr21_ENSG00000160305_Q14689', 'chr21_ENSG00000160305_Q14689',
'chr21_ENSG00000160305_Q14689', 'chr21_ENSG00000160305_Q14689',
'chr21_ENSG00000160305_Q14689', 'chr21_ENSG00000160305_Q14689',
'chr21_ENSG00000160305_Q14689', 'chr21_ENSG00000160305_Q14689',
'chr21_ENSG00000160305_Q14689', 'chr21_ENSG00000160305_Q14689'],
dtype='object', name='ID', length=385)>
hsun3163 commented
Fixed