GraDe_IF: Graph Denoising Diffusion for Inverse Protein Folding (NeurIPS 2023)

Description

Implementation for "Graph Denoising Diffusion for Inverse Protein Folding" arxiv link.

Requirements

To install requirements:

conda env create -f environment.yml

Usage

Like denoising-diffusion-pytorch, there is a brief introduction to show how this discrete diffusion work.

import sys
sys.path.append('diffusion')

import torch
from torch_geometric.data import Batch
from diffusion.gradeif import GraDe_IF,EGNN_NET
from dataset_src.generate_graph import prepare_graph

gnn = EGNN_NET(input_feat_dim=input_graph.x.shape[1]+input_graph.extra_x.shape[1],hidden_channels=10,edge_attr_dim=input_graph.edge_attr.shape[1])

diffusion_model = GraDe_IF(gnn)

graph = torch.load('dataset/process/test/3fkf.A.pt')
input_graph = Batch.from_data_list([prepare_graph(graph)])

loss = diffusion_model(input_graph)
loss.backward()

_,sample_seq = diffusion_model.ddim_sample(input_graph) #using structure information generate sequence

More details can be found in the jupyter notebook

Parameter Chosen in Sampling

Here is an ablation study of two key parameters, step and diverse, in the ddim_sample function used to get improved results presented in the paper. The following results were computed after 50 ensemble runs. One can find how to do ensembles in the jupyter notebook.

BLOSUM Kernel - Diverse Mode

Step	Recovery Rate	Perplexity	Single Sample Recovery Rate
500	0.5341	4.02	0.505
250	0.5370	4.06	0.4679
100	0.5356	4.98	0.4213
50	0.4827	8.02	0.3745

BLOSUM Kernel - Non-Diverse Mode

Step	Recovery Rate	Perplexity	Single Sample Recovery Rate
500	0.5342	4.02	0.505
250	0.5373	4.12	0.4741
100	0.5351	7.43	0.5016
50	0.4999	16.74	0.4736

Uniform Kernel - Diverse Mode

Step	Recovery Rate	Perplexity	Single Sample Recovery Rate
500	0.5286	4.08	0.5022
250	0.5292	4.13	0.4325
100	0.5329	5.28	0.4222
50	0.5341	5.91	0.4212

Uniform Kernel - Non-Diverse Mode

Step	Recovery Rate	Perplexity	Single Sample Recovery Rate
500	0.5286	4.08	0.5022
250	0.5273	4.09	0.4357
100	0.5238	9.49	0.5095
50	0.5285	15.53	0.5113

Comments

Our codebase for the EGNN models and discrete diffusion builds on EGNN, DiGress. Thanks for open-sourcing!

Citation

If you consider our codes and datasets useful, please cite:

@inproceedings{
      yi2023graph,
      title={Graph Denoising Diffusion for Inverse Protein Folding},
      author={Kai Yi and Bingxin Zhou and Yiqing Shen and Pietro Lio and Yu Guang Wang},
      booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
      year={2023},
      url={https://openreview.net/forum?id=u4YXKKG5dX}
      }

About

Graph Denoising Diffusion for Inverse Protein Folding(NeurIPS 2023)

Languages

Language:Jupyter Notebook 53.4%Language:Python 46.5%Language:Shell 0.2%