kipoi / models

Model zoo for genomics

Home Page:http://kipoi.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Specifying targets in DeepSEA

pjshort opened this issue · comments

Hello,

Thanks for all of your hard work on this! I see in the documentation here that you can specify a subset of targets (rather than the full 919 TFs): http://kipoi.org/models/DeepSEA/variantEffects/

If my understanding of this is correct, do you have any examples showing the format of the target file? Thank you!

The main reason for this request is that when running on a VCF file, the file size increases dramatically (the addition of 919 predictions to every line of the VCF) and the run-time is very long (~1000 predictions took 4 hours on the computing cluster I am using, although this is probably a function of using pytorch-cpu rather than a GPU as well?).

As I am only interested in a subset of the transcription factors/cell types, anyway, I thought I could improve run-time and decrease the file size by selecting a small subset.

If you are able to add an alternative way to do it that would be great - thank you so much!

sure, I will do that. btw. if you do have GPUs available you should use the --gpu flag when you create the environment with kipoi env ... alternatively as a simple fix of your existing environment just run 'conda install pytorch torchvision -c pytorch` in your environment. if it is still very slow then it is because of the file being written...

Thank you! I'll give it a try using the --gpu flag and let you know how it works!

@krrome the output selection is now implemented in kipoi_veff, or?

true, the requested functionality is now available using in the CLI: "--model_outputs" which takes the string identifiers of the model outputs or "--model_outputs_i" which takes the integer indices (0-based) of the model outputs.
If you use the python API you can use the "output_filter" kwarg of score_variantsin the exact same way: single values, lists of values and also boolean output selection is allowed.
To get an idea of what the string identifiers of the DeepSEA model are take a look at "DeepSEA/variantEffects/predictor_names.txt". btw this is defined in the model.yaml > schema > targets > column_labels.

in order to use this you will have to install the new version of kipoi and set the environment up with kipoi env create --vep --gpu DeepSEA/variantEffects. Let me know if you encounter any problems.

Hi @krrome and @Avsecz - thanks so much for all of your help!

I have this up and running now and the --gpu flag is way faster than running with pytorch-cpu. The --model_outputs and --model_outputs_i flags also work for me.

Interestingly, doing a small number of models doesn't seem to dramatically speed things up (at least not compared to switching from cpu to gpu).

Thanks again for all of your help - I think this can be closed!

Selecting model outputs only reduces disk space and writing time. Multi-task models like DeepSEA always produce all the results, but we then choose to save only the selected ones. Happy to hear that it works.