rdk / p2rank

P2Rank: Protein-ligand binding site prediction tool based on machine learning. Stand-alone command line program / Java library for predicting ligand binding pockets from protein structure.

Home Page:https://rdk.github.io/p2rank/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to train with conservation (Add parameter -conservation_dir_train)

skodapetr opened this issue · comments

Commands such as

./prank.sh eval-predict ../p2rank-datasets/coach420.ds \
    -c distro/config/conservation \
    -conservation_dir 'coach420/conservation/e5i1/scores' 

can be utilized to use conservation score to evaluate performance on a dataset.

I would expect train command to be similar ie.

.\p2rank\prank.bat traineval 
    -t .\datasets\chen11.ds \
    -e .\datasets\joined(mlig).ds \
    -threads 4 -rf_trees 128 -fr_depth 6 -delete_models 0 -loop 1 -seed 42 \
    -c distro/config/conservation
    -conservation_dir ...

however, it is not clear how to specify two directories as inputs for conservation i.e. one for chen11 and one for joined dataset.

Or do I have to merge all files into a single directory?

commented

Unfortunately currently there is no way how to specify separate conservation directories for training and test datasets. I will add the -conservation_dir_train in the next version.

In the meantime as a workaround you can use one of 2 approaches:

  1. create a single directory with conservation scores for both datasets
  2. put the conservation score files to the same directory as pdb files

For the second approach - is it necessary to specify the conservation_dir ?

commented
commented

I will keep it open since adding -conservation_dir_train param should be implemented.

commented

In version 2.2 -conservation_dir param was replaced with -conservation_dirs which can contain multiple directories.