Multi-state modeling of protein structures using AlphaFold

You can model GPCR or kinase structures in multiple states using this Google Colab notebook.

Updates

May. 15, 2023: Multi-state models are available for all human odorant receptors.
Apr. 24, 2023: GPCR activation state-annotated databases are updated.

Building state-annotated HHsuite databases

All the required scripts and examples are in build_state_annotated_databases

Getting activation state annotations for available experimental GPCR structures
The list of GPCR structures with activation state annotations: GPCRdb, Activation state definition
Preparing input files for building state-annotated HHsuite databases
The script takes a list of PDB IDs for a state, either active, inactive, or intermediate states. For example, GPCR.Active, GPCR.Inactive, and GPCR.Intermediate are lists of active, inactive, and intermediate state GPCRs for this study, respectively. In addition, to select the preferred chain among multiple chains of a PDB file, a list of PDB IDs with the preferred chains is required. Example
Running the script
The script is based on the official guideline for building customized HHsuite databases. To run the script, HHsuite and UniClust30 database are required. Also, one needs to modify build_db.sh to adjust the path of the UniClust30 database.
Example command:

./build_db.sh GPCR.${state}

Expected outputs The output of the scripts will be a set of HHsuite database files for a GPCR state.

GPCR100.${state}_a3m.ff{data,index}
GPCR100.${state}_hhm.ff{data,index}
GPCR100.${state}_cs219.ff{data,index}

Pre-built GPCR/Kinase databases
State-annotated GPCR databases can be obtained from our repositories on Zenodo or Google Drive.

GPCR structure prediction using AlphaFold

The structure prediction scripts rely on AlphaFold. We slightly modified it to conduct ablation studies and to model GPCR structures in a specific activation state. Follow the setup procedure and download genetic databases and model parameters for AlphaFold. In contrast to the original AlphaFold, our scripts are based on a non-Docker version and run on top of an Anaconda environment for AlphaFold. To create an environment for running AlphaFold, one may refer to an issue page of the AlphaFold repository or execute commands described in our script.

Prerequisite

AlphaFold package
Anaconda environment for AlphaFold
Activation state annotated GPCR100 databases

Update libconfig_alphafold.py One needs to update

Paths for executables: jackhmmer, hhblits, hhsearch, kalign
Paths for genetic databases: DOWNLOAD_DIR, {uniref90, mgnify, bfd, small_bfd, uniclust30, pdb70}_database_path, template_mmcif_dir, obsolete_pdbs_path
Paths for activation state annotated GPCR100 databases: gpcr100_active_db_path, gpcr100_inactive_db_path

GPCR structure predictions We assumed an activated Anaconda environment that has all required libraries/packages for running AlphaFold.

Modeling GPCRs in a specific activation state (this study)

./structure_prediction/run.py ${FASTA_FILE} --preset study --state active    # for modeling in active state
./structure_prediction/run.py ${FASTA_FILE} --preset study --state inactive  # for modeling in inactive state

In addition to this script, you may use a ColabFold-based script that is utilized for our Colab notebook

./structure_prediction/run_colabfold.py ${FASTA_FILE} --state active   # for modeling in active state
./structure_prediction/run_colabfold.py ${FASTA_FILE} --state inactive # for modeling in inactive state

Note that, this script is optimized for our Colab notebook environment and has not extensively tested. Also, running this script creates a directory, gpcr100, which contains symbolic links to the required database files.

The original AlphaFold protocol

./structure_prediction/run.py ${FASTA_FILE} --preset original

Other protocols for the ablation study as described in the paper

# running the original AlphaFold protocol but using activation state-annotated GPCR databases
./structure_prediction/run.py ${FASTA_FILE} --preset original --state active     # for modeling in active state
./structure_prediction/run.py ${FASTA_FILE} --preset original --state inactive   # for modeling in inactive state

# running AlphaFold using sequence and MSA-based features, without structure templates-based features
./structure_prediction/run.py ${FASTA_FILE} --preset no_templ

# running AlphaFold using sequence-based features only, without MSA and structure templates-based features
./structure_prediction/run.py ${FASTA_FILE} --preset seqonly

# running MODELLER
./structure_prediction/run.py [FASTA file] --preset tbm

Sampling of intermediate conformations

./structure_prediction/interpolate.py --fasta_path=${FASTA_FILE} \
                                      --pdb_init=${INACTIVE_MODEL},${ACTIVE_MODEL} \
                                      --unk_pdb=True \
                                      --interpolate_region=${TM_RESIDUES}

Both active and inactive state models need to be generated first before providing them to the script. The option "interpolate_region" is optional, but it may improve structure comparison between states. An example input is as follows: "19-51,56-87,92-127,136-167,183-223,376-413,418-443".

Running the protocol on Colab

A slightly modified protocol using ColabFold pipeline is implemented on Colab. The main difference is the MSA generation step; the ColabFold-based protocol utilizes MMseqs2 for homologous sequence searches.

GPCR models in the active and inactive states

We have modeled non-olfactory human GPCRs in the active and inactive states using our multi-state modeling protocol. The models are available via Zenodo or Google Drive.

References

[1] Heo, L. and Feig, M., Multi-State Modeling of G-protein Coupled Receptors at Experimental Accuracy, Proteins (2022), doi:10.1002/prot.26382. Link
[2] Jumper, J. et al., Highly accurate protein structure prediction with AlphaFold, Nature (2021), 596, 583-589. Link
[3] Mirdita, M. et al., ColabFold - Making protein folding accessible to all, Nature Methods (2022), 19, 679-682. Link

huhlim / alphafold-multistate