rdk / p2rank

P2Rank: Protein-ligand binding site prediction tool based on machine learning. Stand-alone command line program / Java library for predicting ligand binding pockets from protein structure.

Home Page:https://rdk.github.io/p2rank/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

easy way to get features

rohanvarm opened this issue · comments

Is there an easy way to access the feature descriptor of the SAS points?

commented

Hi and sorry for a late reply. Currently, there is no easy/straightforward way to do this.

If you are still interested, could you say something more about your use case?

In the future release I plan to add:

  1. easy way to export all feature vectors for all SAS points for individual proteins to a CSV file
  2. ability to visualize all individual features mapped to SAS points in PyMol

For now, you actually can export the feature vectors to an ARFF file (basically CSV with a header), but it is a hassle. It is only possible to do it in a training phase and only for the whole dataset at once. So, you can start a fake training run on a single-protein dataset with -delete_vectors 0:

prank traineval -train test.ds -eval test.ds -delete_vectors 0 -extra_features xyz

# Notes:
#  * test.ds should contain a path to only a single pdb/cif file 
#  * xyz feature adds (x,y,z) coordinates of the SAS point to the feature vector (optional)
#  * vectorsTrain.arff.gz file will be produced in the output folder

Thanks! That is useful, I intended to use this for another downstream app