anton-bushuiev / PPIformer

Learning to design protein-protein interactions with enhanced generalization (ICLR24)

Home Page:https://arxiv.org/abs/2310.18515

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible reason for a counter-intuitive result

amin-sagar opened this issue · comments

Hello.
Thanks for this awesome work (along with PPIRef).

I just tried PPIformer on one of my structures.

I just did an alanine scan and most of the predictions make sense.
There are two Alanine mutations away from the interface where the affinity actually increases but PPIformer shows a slight destabilization. This is not unexpected as these mutations possibly act by stabilizing the unbound structure which is hard to understand.

What's unintuitive is that there are two neighboring residues, both asparagine, one is making a hydrogen bond with the other chain and the other one is facing the solvent and not making any interactions. But PPIFormer predicts the mutation of second asparagine (the one that makes no interactions with the partner) to Alanine to be more destabilizing than the first.

What could be the reason for such behavior?

Unfortunately, I can't share this structure but I will try to reproduce this on other structures that I can share.
Best,
Amin.

Hi Amin!

Thank you very much for your feedback! I am glad to see that most of your predictions made sense.

There may be two explanations for the unintuitive behaviour you describe:

  1. The second asparagine (that does not interact with the other chain) may be important for some other basic function (e.g. stability or protein expression). PPIformer is trained in two steps. First, it is pre-trained to fill in masked amino acids in protein-protein interfaces. During this step, the model learns most likely patterns in natural protein interfaces. As shown, for example, in this paper, this kind of pretraining may lead to the model making predictions towards more stable and easier expressible proteins. In the second step of the training, PPIformer is fine-tuned for ddG from the small available data.
  2. The model is not perfect yet. As Table 2 in our paper shows, there is still a gap for improving PPIformer. Specifically, Flex ddG, a five orders of magnitude slower Rosetta-based protocol, achieves better performance on the SKEMPI v2.0 benchmark dataset. It means that PPIformer may still fail in certain cases, and we are currently working to understand them better.

Please let me know if you have other questions.

Best,
Anton

Thanks @anton-bushuiev
This makes perfect sense.
Another issue could be that I was trying to predict the changes for a protein-peptide interface.
The stability and expression learnings from proteins might not apply to peptides.
I look forward to the finetuning scripts to finetune the model for protein-peptide interactions.
Feel free to close this issue.