ASLeonard / SuBSeA

Package to test protein predictions through the PDB

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

License: GPL v3 Python application Codacy Badge

Subunit Binding Sequence Alignment (SuBSeA)

Software to test qualitative hypotheses generated from the polyomino evolution through duplication study.

This program compares the binding residues from macromolecular interfaces with optimal sequence alignment to estimate the likelihood that two protein complex interactions are related. A more detailed description of the analysis can be found in the methods here.

Components

There are several main components to the analysis pipeline, outlined below.

  • Protein complex dataset generation
    • utility.py
  • Bioinformatic data pulling
    • domains.py
    • pisa_XML.py
  • SuBSeA analysis
    • pipeline_runner.py
    • binding_alignment.py
  • Visualisation
    • pdb_visualisation.py

Install

This software has been tested on Python 3.6+ and several common packages listed in requirements.txt.

In addition, a working version of needle from Emboss is necessary. Other implementations of needle have not been tested, but should work provided a similar output is achievable.

Testing

Functionality can be tested by running the following command.

python -m pytest tests/

Errors at this stage are likely due to a missing needle exectuable or required python packages.

Usage examples

A simple example can be run by providing the two subunits to compare.

python binding_alignment.py 3WJM A 3WJM B

Which calculates the SuBSeA confidence between the two heteromeric interfaces of the protein complex 3WJM.

If the interaction under examination is not isologous, alternate chains can be provided for comparison.

python binding_alignment.py 2IX2 A 2IX2 B --alternate_chains C A

Which runs the comparison of the interactions between chains A->C with the interaction between chains B->A.

Interfaces can also be compared across subunits, such as analysing homomeric precursors.

python binding_alignment.py 15C8 L 4OFD A --alternate_chains H B

Again which compared the interaction between 15C8 chains L->H and 4ODF chains A->B.

A larger scale analysis can be conducted with

python pipeline_runner.py --pullINT

Which will compile the heteromeric comparisons needed from the dataset, and automatically download any associated files.

Limitations

Not all protein complexes are stored in standard formats. Particularly, there are often conflicts between the PDB and PDBePISA with regards to quaternary structure and active interactions. When there are issues in compatability, it is often the case that certain interactions are calculated incorrectly, which can provide a meaningless result with no alignment.

The full analysis requires

  • FASTA sequence
  • PDBePISA macromolecular interfaces
  • CATH domains (or other homology identifier)

Any protein complex with incomplete data will struggle in this analysis, so only a subset of recorded proteins can be used correctly.

About

Package to test protein predictions through the PDB

License:GNU General Public License v3.0


Languages

Language:Python 100.0%