CSB-KaracaLab / RBD-ACE2-MutBench

Benchmarking the structure-based mutation predictors on ace2-rbd binding data set

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gitub-sars-cov-2

Motivation

Since the start of COVID-19 pandemic, a huge effort has been devoted to understanding the Spike(SARS-CoV-2)-ACE2 recognition mechanism. As prominent examples, two deep mutational scanning (DMS) studies (Chan et al., 2020; Starr et al., 2020) traced the impact of all possible mutations/variants across the Spike-ACE2 interface. Expanding on this, we benchmark six widely used structure-based binding affinity predictors (FoldX, EvoEF1, MutaBind2, SSIPe, HADDOCK, and UEP) and two recent AI predictors (mmCSM-PPI, TopNetTree) on the variant Spike-ACE2 deep mutational interaction set. Among these approaches, FoldX ranks first with a 64% success rate. Upon performing residue-based analyses, we reveal critical algorithmic biases, especially in ranking mutations with increasing/decreasing hydrophobicity/volume. We also show that the approaches using evolutionary-based terms in their scoring functions misclassify most mutations as binding depleting. AI approaches, mmCSM-PPI and TopNetTree, yield comparable performances to the force field-based techniques. These observations suggest plenty of room to improve the conventional affinity predictors for guessing the variant-induced binding profile changes of Spike-ACE2.

Our mutants models and their prediction scores can be visualized at https://rbd-ace2-mutbench.github.io/

For more, please check DOI:10.1101/2022.04.18.488633

Folder organization of our repository:

benchmark-data/

raw-data

The mutations are imposed on RBD-ACE2 complex with PDB ID: 6m0j

  • ACE2_DMS_benchmark_set.csv: DMS binding values of ACE2 179 point mutations.
  • RBD_DMS_benchmark_set.csv: DMS binding values of 84 RBD point mutations.
  • [ACE2/RBD]_DMS_all_interface_set: Complete 988 interfacial mutation set.
  • HADDOCK_scores.csv & FoldX_scores.csv & FoldXwater_scores.csv & EvoEF1_scores.csv: HADDOCK, FoldX, FoldXwater, and EvoEF1 mutant scores (263 mutations) + 6m0j wild-type score.
  • MutaBind2_scores.csv & SSIPe_scores.csv & UEP_scores.csv & TopNetTree_scores.csv & mmCSM-PPI_scores.csv : Predicted ∆∆G changes of each mutation.

benchmark

  • SARS-CoV-2-RBD_ACE2_benchmarking_dataset.csv: Predicted affinity change scores (∆∆G) of each predictor.
  • UEP_SARS-CoV-2-RBD_ACE2_benchmarking_dataset.csv: UEP calculates ∆∆G when the position of interest has interactions with at least two other residues (highly packed residues within 5Å range). This is a subset of the main prediction scores with 129 mutations (82 ACE2, 47 Spike-RBD mutations).

scripts

shell_scripts

  • run_FoldX.csh: Applies single amino acid mutations and computes binding affinity by using FoldX. (Called FoldX commands: Repair, BuildModel, AnalyseComplex).
  • run_FoldXwater.csh: The same as above by using FoldX with water option.
  • run_EvoEF1.csh: Applies single amino acid mutations and computes binding affinity by using EvoEF1. (Called EvoEF1 commands: RepairStructure, BuildMutant, ComputeBinding).
  • get_HADDOCK_scores.csh: Greps the predicted HADDOCK energy scores from the HADDOCK energy files.

notebooks

  • creating_benchmarking_datasets.ipynb: Creates SARS_CoV_2_RBD_ACE2_benchmarking_dataset.csv and UEP_SARS_CoV_2_RBD_ACE2_benchmarking_dataset.csv files.
  • performance_analysis.ipynb: Calculates success rates of predictors by using SARS_CoV_2_RBD_ACE2_benchmarking_dataset.csv and UEP_SARS_CoV_2_RBD_ACE2_benchmarking_dataset.csv files.
  • metric_analyses_figure_preparation.ipynb: Performs metric analyses (volume, hydrophobicity, flexibility, and physicochemical property change upon a mutation) and visualizes the results as plots.

Acknowledgements

All the calculations are carried out at the HPC resources of Izmir Biomedicine and Genome Center.

Contact

ezgi.karaca@ibg.edu.tr

References

Chan,K.K., Dorosky,D., Sharma,P., Abbasi,S.A., Dye,J.M., Kranz,D.M., Herbert,A.S. and Procko,E. (2020) Engineering human ACE2 to optimize binding to the spike protein of SARS coronavirus 2. Science (1979), 369, 1261–1265.

Starr,T.N., Greaney,A.J., Hilton,S.K., Ellis,D., Crawford,K.H.D., Dingens,A.S., Navarro,M.J., Bowen,J.E., Tortorici,M.A., Walls,A.C., et al. (2020) Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding. Cell, 182, 1295-1310.e20.

We used the stand-alone packages of FoldX,EvoEF1 and UEP and run HADDOCK, MutaBind2, and SSIPe by using the relevant services to generate mutant models and their scores.

About

Benchmarking the structure-based mutation predictors on ace2-rbd binding data set


Languages

Language:Jupyter Notebook 99.3%Language:Shell 0.7%