Using DeeplyTough as an embedder

Question

Using DeeplyTough as an embedder

vinbl opened this issue 2 years ago · comments

Hello Josh,

I am thinking of the possibility of using DeeplyTough as an embedder for protein pockets, so that each pocket is mapped to a vector of descriptors. Could you provide some guidance on how these could be obtained?

Also, is it possible to process a custom pdb as the input containing only the pocket residues, instead of relying on the automated pocket detection?

Thank you very much

Joshua Meyers · Answer 1 · Mon Feb 21 2022 20:55:39 GMT+0800 (China Standard Time)

Hello @vinbl, thanks for raising an issue! Yep, DeeplyTough could be exactly what you're looking for. Apologies for the delay.

Obtaining the descriptors for each pocket is relatively straightforward, there are many strategies but I'll suggest the one with the fewest code changes.

A pre-requisite is that you can run 'custom_evaluation.py' on the README. This involves running deeplytough pairwise pocket matching for a set of (pdb, pocket) pairs defined in pairs.csv.
Within custom_evaluation.py the calculation of the entries dictionary here involves calculating descriptors for each pocket so if you modify the code to save this dict somewhere you should be good to go.
The simplest way to setup pairs.csv would be to just duplicate your pocket entries (it will essentially be calculating the distance between pocket 1 and pocket 1 which should be 0. This allows you to loop over just the pockets you care about without needing to modify the current interface.

The descriptor for each pocket has a dimensionality of 128.
"is it possible to process a custom pdb as the input containing only the pocket residues, instead of relying on the automated pocket detection?" no and yes.
-- Automated pocket detection is definitely not required, you can specify your own pockets.
-- But you can't provide a specific set of residues as the protein file, this is because deeplytough takes a full protein pdb and a pocket pdb to define a 24 angstrom cube around the centroid of the pocket residues, which it crops from the original protein coordinates. If you provide it with a partial protein to begin with, this deviates from the training conditions and it might not perform as expected.
-- what you should do is save a custom pdb containing only the pocket residues, this defines your pocket file. And specify (originalPDB, pocketFile) in pairs.csv as described above.

p.s. I would suggest starting from our image on Dockerhub, or building the docker image yourself since this repo has a few stale dependencies now which can be a bit fiddly to install (docker pull joshuameyers/deeplytough)

Hope this helps