cumc / xqtl-protocol

Molecular QTL analysis protocol developed by ADSP Functional Genomics Consortium

Home Page:https://cumc.github.io/xqtl-protocol/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TensorQTL Cis not able to handle multiple cis windows in the regionlist covering the same position.

hsun3163 opened this issue · comments

After the following codes are removed.

        phenotype_pos_df = phenotype_pos_df.assign(deviate_ratio = abs(0.5- (phenotype_pos_df.pos-phenotype_pos_df.start)/(phenotype_pos_df.end-phenotype_pos_df.start)) ## Deviate_ratio measure how far away are the pos deviate from the middle of the customized cis windows.  \n",
    "                            ).sort_values(by = [phenotype_id,\"deviate_ratio\"]).drop_duplicates(phenotype_id, keep='first').set_index(phenotype_id)[[\"chr\",\"start\",\"end\"]]  

When a phenotype_id are covered by multiple extended cis windows, following error occurs:

cis-QTL mapping: nominal associations for all variant-phenotype pairs
  * 389 samples
  * 385 phenotypes
  * 36 covariates
  * 1324895 variants
  * applying in-sample 0.006426735218508998 MAF filter
  * cis-window: ±0
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/conda/envs/TensorQTL/lib/python3.10/site-packages/tensorqtl/cis.py", line 211, in map_nominal
    igc = genotypeio.InputGeneratorCis(genotype_df, variant_df, phenotype_df, phenotype_pos_df, group_s=group_s, window=window)
  File "/opt/conda/envs/TensorQTL/lib/python3.10/site-packages/tensorqtl/genotypeio.py", line 391, in __init__
    assert (phenotype_df.index == phenotype_df.index.unique()).all()
  File "/opt/conda/envs/TensorQTL/lib/python3.10/site-packages/pandas/core/ops/common.py", line 69, in new_method
    return method(self, other)
  File "/opt/conda/envs/TensorQTL/lib/python3.10/site-packages/pandas/core/arraylike.py", line 32, in __eq__    return self._cmp_method(other, operator.eq)
  File "/opt/conda/envs/TensorQTL/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6042, in _cmp_method
    raise ValueError("Lengths must match to compare")
ValueError: Lengths must match to compare

This is due to duplicate entries of phenotype_df after merged with the customized cis window, as indicated below.

>>> phenotype_df.index.unique
<bound method Index.unique of Index(['chr21_ENSG00000154654_O15394', 'chr21_ENSG00000142192_P05067',
       'chr21_ENSG00000156261_P50990', 'chr21_ENSG00000159082_O43426',
       'chr21_ENSG00000159131_P22102', 'chr21_ENSG00000205726_Q15811',
       'chr21_ENSG00000160209_O00764', 'chr21_ENSG00000141959_P17858',
       'chr21_ENSG00000160305_Q14689'],
      dtype='object', name='ID')>
>>> phenotype_df.index.unique
<bound method Index.unique of Index(['chr21_ENSG00000154654_O15394', 'chr21_ENSG00000154654_O15394',
       'chr21_ENSG00000154654_O15394', 'chr21_ENSG00000154654_O15394',
       'chr21_ENSG00000142192_P05067', 'chr21_ENSG00000142192_P05067',
       'chr21_ENSG00000142192_P05067', 'chr21_ENSG00000142192_P05067',
       'chr21_ENSG00000142192_P05067', 'chr21_ENSG00000142192_P05067',
       ...
       'chr21_ENSG00000160305_Q14689', 'chr21_ENSG00000160305_Q14689',
       'chr21_ENSG00000160305_Q14689', 'chr21_ENSG00000160305_Q14689',
       'chr21_ENSG00000160305_Q14689', 'chr21_ENSG00000160305_Q14689',
       'chr21_ENSG00000160305_Q14689', 'chr21_ENSG00000160305_Q14689',
       'chr21_ENSG00000160305_Q14689', 'chr21_ENSG00000160305_Q14689'],
      dtype='object', name='ID', length=385)>

@zq2209 are these codes removed due to some other problems?

Fixed