zeeev / wham

Structural variant detection and association testing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

classify_WHAM_vcf.py issue

JuanmaMedina opened this issue · comments

Good morning Zev,

After correctly running WHAMG script, I tried to perform the classification step by using classify_WHAM_vcf.py. But in this step I found a couple of errors:

  • The first is a slight modification of the source code of the python script, as the cross_validation module seems to be deprecated, and it is advised to import the corresponding functions from model_selection instead (see https://stackoverflow.com/questions/30667525/importerror-no-module-named-sklearn-cross-validation for further details). This can be easily fixed by editing line 6 of code with:
    from sklearn.model_selection import cross_val_score

  • I have not solved the second problem, which is the main reason I am opening this issue:

Traceback (most recent call last):
  File "/home/genomica/bin/wham/utils/classify_WHAM_vcf.py", line 333, in <module>
    for r in results:
  File "/home/genomica/anaconda3/envs/wham/lib/python2.7/multiprocessing/pool.py", line 673, in next
    raise value
KeyError: 'AT'

Could you give me a hint here? Regarding the conda environment where I am running the script:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
blas                      1.0                         mkl  
ca-certificates           2019.5.15                     0  
certifi                   2019.6.16                py27_1  
intel-openmp              2019.4                      243  
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
mkl                       2019.4                      243  
mkl_fft                   1.0.12           py27ha843d7b_0  
mkl_random                1.0.2            py27hd81dba3_0  
ncurses                   6.1                  he6710b0_1  
numpy                     1.16.4           py27h7e9f1db_0  
numpy-base                1.16.4           py27hde5b4d6_0  
openssl                   1.1.1c               h7b6447c_1  
pip                       19.1.1                   py27_0  
python                    2.7.16               h9bab390_0  
readline                  7.0                  h7b6447c_5  
scikit-learn              0.20.3           py27hd81dba3_0  
scipy                     1.2.1            py27h7c811a0_0  
setuptools                41.0.1                   py27_0  
sqlite                    3.29.0               h7b6447c_0  
tk                        8.6.8                hbc83047_0  
wheel                     0.33.4                   py27_0  
zlib                      1.2.11               h7b6447c_3  

Thanks in advance!

@JuanmaMedina,

Depending on your use case whamG might better choice. WhamG internally classifies structural variant types, and is generally more accurate. However, if you need to classify wham calls, I'll revisit this issue.

Hello Zev,

Thanks for your fast response.

Yes, I indeed used the whamg script, as recommended in the documentation. I edited the OP in case it was not clear.

I parsed out the resulting .vcf to extract the SV-annotation information. However, I was trying to use the classify_WHAM_vcf.py on the resulting raw .vcf in case it provided me with any extra useful information. This was the reason of my issue opening. If the script actually does not provide with more information, or it is not suitable to run it after performing a whamg run, maybe we can close this, although I think this error could persist in other users.

Cheers.