TauferLab / XPSI

Framework for identifying protein structural properties from diffraction patterns.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

XPSI: X-ray Free Electron Laser (XFEL) based Protein Structure Identifier

This repository contains the framework of XPSI: X-ray Free Electron Laser (XFEL) based Protein Structure Identifier. This framework combines deep learning and traditional machine learning to identify three structural properties (i.e., orientation, conformation, and protein type) through the diffraction patterns of a given protein.

Motivation: Proteins and other biological molecules are responsible for many vital cellular functions. The structure of the protein determines its functionality. Identifying the information of a protein structure is helpful to understand the protein functional mechanisms, which can help solve many difficult problems such as determining the cause of diseases and designing drugs.

Diffraction patterns: The diffraction patterns are images generated by applying an X-ray Free Electron Laser (XFEL) beam to proteins. These diffraction patterns (images) can reveal the inner structure of a protein. Specifically, three properties can be embedded in an image: the orientations of a protein conformation, the conformations of different folded proteins, and the different types of proteins.

Framework overview: The input data are the diffraction patterns (i.e., images that embed the structure of proteins). The diffraction patterns are generated by simulations or experiments using an X-ray free electron laser (XFEL) beam. The higher the beam intensity, the higher the resolution and precision in diffraction patterns. Patterns are processed by an autoencoder that captures key information and produces a tensor representation of each pattern. The autoencoder consists of an encoder and a decoder. The encoder has 3 convolutional filters and downsampling layers. The decoder has the reverse structure of the encoder. The new latent space is used to train and validate traditional machine learning models such as k-nearest neighbors (kNN). We use a kNN-angle regressor for predicting the orientation and a kNN-classificator for predicting different protein conformations.

To run the XPSI framework, open the jupyter notebook xpsi_research.ipynb

Installation

The software stack required to run the XPSI framework can be installed by using Anaconda or pip. Each of these options will installed the required dependencies.
Dependencies:

  • Python=3.7.7
  • numpy
  • pandas
  • scipy
  • pillow
  • scikit-learn=0.23.1
  • seaborn
  • matplotlib
  • configparser
  • tensorflow=2.0.0
  • jupyter
  • ipyfilechooser

Moreover, to download the data wget is required.

Using Anaconda (Preferred installation option)

If you do not have Anaconda installed, you can follow the instructions here to install it. Make sure to change the prefix in install/env_conda to the location of Anaconda in your local machine (e.g., /opt/anaconda3/, /home/opt/anaconda3/)
Run the next commands on your local machine:

conda env create -f install/env_conda.yml
conda activate xpsi

Once you have your environment installed, you can run jupyter notebook and run the xpsi_research.ipynb

Using Anaconda on Power9 processors (e.g., Summit, Tellico)

When using Conda on Power9 processors, such as Summit or Tellico clusters, the Tensorflow download needs to be registered. To do so follow the next commands on your Power9 cluster.

conda config --prepend channels https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
conda env create -f install/env_conda_power9.yml
conda activate xpsi_p9

Using Anaconda on Jetstream

When using Conda on Jetstream, start by running the web shell desktop of the instance of your virtual machine. Then, open the terminal in the web shell desktop and enter the ezj command. Since anaconda is a pre-installed package in Jetstream, running this command will display the folder location of anaconda. An example of anaconda directory displayed by this command: Anaconda installed to /opt/anaconda3.

Then, open the <project_directory>/install/env_conda.yml file and change the prefix: to the location of anaconda that was displayed using ezj. Taking the above example, you would change the prefix to the following: prefix: /opt/anaconda3/envs/xpsi

Now, you can create a conda environment by running the following commands:

conda env create -f install/env_conda.yml
conda activate xpsi

Once you have your environment installed, you can run jupyter notebook and run the xpsi_research.ipynb

Using pip

You are required to have Python=3.7.7 installed, and automatically pip is installed with it. If you do not have pip installed follow the instructions here to install it.
Run the next commands on your local machine:

python -m pip install -r install/env_pip.txt

Once you have your environment installed, you can run jupyter notebook and run the xpsi_research.ipynb

Launching the Jupyter notebook on HPC Clusters

There are different options to launch your jupyter notebook from an HPC cluster. Here we provide two options but this step will depend on the specifications of your cluster.

In an interactive node

Below is an example protocol to run Jupyter notebook in an interactive node on a high-performance computer (HPCs).The instructions have been adapted from this webpage. First, you would need to request an interactive node. We show an example for two schedulers: SLURM and LSF.

## For SLURM
srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i

## For LSF 
# For cpu only 
bsub -n 1 "num=1 :mode=exclusive_process" -Is bash
# If your node have gpu
bsub -gpu "num=1 :mode=exclusive_process" -Is bash

From within the interactive node, you need to activate the conda environment

cd $HOME
conda activate xpsi

Run Jupyter on the claimed interactive node. Check the node name.

jupyter notebook --no-browser --ip='0.0.0.0'

On your local terminal, start another SSH session with tunnelling using the interactive node name as noted above

ssh user@host -L8888:nodeName:8888 -N

Copy the URL that the Jupyter daemon has generated in step 4 and paste it in the browser on your computer. URL should look something similar to http://(nodeName or 127.0.0.1):8888/?token=3f7c3a8949b3fa1961c63653873fea075a93a29bffe373b5. Choose either nodeName or 127.0.0.1 in the URL.

In a JupyterLab server

Supercomputers like Summit from ORNL, include a OLCF JupyterHub. The OLCF JupyterHub implementation will spawn you into a single-user JupyterLab environment. These are the instructions to access the OLCF JupyterHub. Contact your computer facility support for help running Jupyter in other supercomputers.

Copyright and License

Copyright (c) 2022, Global Computing Lab

XPSI is distributed under terms of the Apache License, Version 2.0 with LLVM Exceptions.

See LICENSE for more details.

About

Framework for identifying protein structural properties from diffraction patterns.

License:Apache License 2.0


Languages

Language:Jupyter Notebook 86.6%Language:Python 13.4%Language:Shell 0.0%