Code for the paper "State-of-the-Art Estimation of Protein
Model Accuracy using AlphaFold." Experiments were run using code cloned from https://github.com/deepmind/alphafold on commit1d43aaff941c84dc56311076b58795797e49107b
. More documentation and scripts coming soon.
The raw data from the analyses in the paper can be found here: https://drive.google.com/drive/folders/1hsLs-Ul1ZpsFWrfpAgeWOJfUjN77h0dO?usp=sharing
This folder contains the following files with ranking results from the Rosetta decoy dataset:
rosetta_alanine.csv
-- Results with the decoy sequence set to a sequence of all alaninesrosetta_targetseq.csv
-- Results with the decoy sequence set to the target sequencerosetta_sidechains.csv
-- Results with sidechains included in the decoy structures
The fields in these files are as follows:
target
-- The PDB ID of the target sequncedecoy_name
-- The identifier of the decoy structure. "none" means no decoy was used, and "native" means the native structure was used as a decoyrmsd_in
-- The RMSD of the decoy to the native structuregdtts_in
-- The GDT_TS Score of the decoy to the native structuretmscore_in
-- the TM Score of the decoy to the native structureplddt
-- The predicted LDDT Score from AlphaFoldptm
-- The predicted TM Score from AlphaFoldtmscore_diff
-- The TM Score between the decoy structure and AlphaFold's output structuretmscore_out
-- The TM Score of AlphaFold's output structure to the native structurermsd_out
-- The RMSD for the AlphaFold's output structure to the native structurerosetta_score
-- The Rosetta energy of the decoy structuredan_score
-- The DeepAccNet score of the decoy structureis_native
-- Boolean indicator of whether the decoy is the native structure.no_template
-- Boolean indicator of whether the prediction was made without a template.
In addition, the following files are included from the CASP14 evalutation:
casp_alanine.csv
-- Results with the decoy sequence set to a sequence of all alaninescasp_targetseq.csv
-- Results with the decoy sequence set to the target sequencecasp_sidechains.csv
-- Results with sidechains included in the decoy structures
The fields in these files are as follows:
target
-- The CASP14 target identifierdecoy_name
-- The name of the decoy server submission from CASP14.gddts_in
-- The GDT_TS score of the template to the native structureplddt
-- The predicted LDDT Score from AlphaFoldptm
-- The predicted TM Score from AlphaFoldtmscore_diff
-- The TM Score between the decoy structure and AlphaFold's output structuregdtts_out
-- The GDT_TS Score of AlphaFold's output structure to the native structureno_template
-- Boolean indicator of whether the prediction was made without a template.
Note that, for the CASP data, we were unable to access native structures for targets T1085 and T1086, so output accuracies are unavailable for these targets. In general, numeric values are set to -1 when they are not applicable (for instance, the input TM Score for a line representing AlphaFold's behavior with no template input).
The rosetta decoy set can be found here: https://files.ipd.uw.edu/pub/decoyset/decoys.zip