zeeev / wham

Structural variant detection and association testing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue with parenthesis in print, using python 3

sahilseth opened this issue · comments

Here is an example error:

 File "utils/classify_WHAM_vcf.py", line 46
    print '##INFO=<ID=WC,Number=1,Type=String,Description="WHAM classifier variant type">'
                                                                                         ^
SyntaxError: Missing parentheses in call to 'print'

Are you using WHAM for SV discovery? If so just i'd advise you to use WHAM-GRAPHENING -k.

I have a tumor normal pair, and would like to explore translocations.

Thanks,
Sahil

On Apr 12, 2016, at 4:26 PM, Zev Kronenberg notifications@github.com wrote:

Are you using WHAM for SV discovery? If so just i'd advise you to use WHAM-GRAPHENING -k.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

That'd be the correct use case for WHAM. Did the classifier run for you after changing the quote?

@Sahil, Your current set of errors stem from incompatibilities in syntax between python 2X and python 3. I had not tested the code against any versions of python3 unfortunately (just noted that we claim Python3 is supported in the docs, so I apologize for that error).

Do you happen to have a version of python 2.7 installed on your system? You can have multiple different versions of python on your machine and so running an instance of python2.7 is probably your easiest fix. I can also try and port the code over to python 3, but I expect that this will probably take up to a week to update.

My recommendation would be to install an anaconda distribution of python2.7 here: https://www.continuum.io/downloads https://www.continuum.io/downloads ; it will come with all of the packages you need to run the classifier and will not overwrite your default python that is installed on your machine.

-EJ

On Apr 12, 2016, at 3:01 PM, Zev Kronenberg notifications@github.com wrote:

That'd be the correct use case for WHAM. Did the classifier run for you after changing the quote?


You are receiving this because you were assigned.
Reply to this email directly or view it on GitHub #30 (comment)

Yes, I just created a new python2 env, and compiled again to be sure. Now it works, but shows a lot of warnings:

python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)

I do see the file is getting created, here is the first call:

# wham bam
1       10004   .       N       AACCCCNANCCCNACCCCAACCCCCACCCCAN        .       .       LRT=0;WAF=1,0.500001,0.750001;GC=1,1;AT=1,0.0434783,0.0434783,0,0,0,0,0,0,0,0,0,0.0434783,0,0.475082;CF=0.173913;CISTART=9965,10041;CIEND=10203,10203;PU=3;SU=0;CU=13;RD=23;NC=3;MQ=13.4783;MQF=1;SP=1,0,0;CHR2=1;DI=f;END=10204;SVLEN=201      GT:GL:NR:NA:NS:RD       0/1:-49.7373,-13.8629,-19.7461:2:18:18:20       1/1:-255,-255,-0.374464:0:3:3:3
# classifier:
1       10004   .       N       AACCCCNANCCCNACCCCAACCCCCACCCCAN        .       .       LRT=0;WAF=1,0.500001,0.750001;GC=1,1;AT=1,0.0434783,0.0434783,0,0,0,0,0,0,0,0,0,0.0434783,0,0.475082;CF=0.173913;CISTART=9965,10041;CIEND=10203,10203;PU=3;SU=0;CU=13;RD=23;NC=3;MQ=13.4783;MQF=1;SP=1,0,0;CHR2=1;DI=f;END=10204;SVLEN=201;WC=INR;WP=0.254,0.158,0.372,0.216    GT:GL:NR:NA:NS:RD       0/1:-49.7373,-13.8629,-19.7461:2:18:18:20       1/1:-255,-255,-0.374464:0:3:3:3

interpretation
WC=INR; this probably means insertion.
WP=0.254,0.158,0.372,0.216: not sure of the sequence of probabilities.

Sorting the last column of training data (lexicographically), I get: DEL, DUP, INR, INV. In this example, the variant was classified as INR, with prob of 0.3 - which seems to be highest in this case. So I can assume that the labels of the prob. are also DEL, DUP, INR and INV?

info from docs
WP:
The probabilities for each class label generated by the random forest classifier.
The format field is comprised of six colon-delimited fields.

This is comma separated, and the number depends on training data supplied, am I getting this right?

thanks!

@sahilseth That is correct.

@ejodude Any movement on this EJ?

Thanks @sahilseth for the heads up. It looks like the code is running fine, but that we will need to add an update before scikitlearn moves to v0.19. I've also changed the wiki highlight the requirement for 2.7 and not 3.0+