fsimkovic / GaussDCA

Python implementation of GaussDCA using Cython.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GaussDCA (Cython)

Python implementation of GaussDCA using Cython. Adapted from here.

For the original paper please refer to "Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners" by Carlo Baldassi, Marco Zamparo, Christoph Feinauer, Andrea Procaccini, Riccardo Zecchina, Martin Weigt and Andrea Pagnani, (2014) PLoS ONE 9(3): e92721.

This version implements what is called the "slow fallback" in the original Julia implementation.

Installation

Runs in Python 3.6

  1. Make sure all dependencies are installed: pip install -r requirements.txt
  2. Compile the cython source code: cd src; python setup.py build_ext -i; cd ..

Usage

python gaussdca/gaussdca.py [-h] [-o OUTPUT] [-t THREADS] alignment_file alignment_format

So far, the alignment format needs to be specified using one of ConKit's data formats. The output will be printed or saved into a file if given. The number of threads for multiprocessing can be specified.

Performance

The following chart shows the elapsed runtime in minutes for a large test alignment (test/large.a3m) using 8 cores. performance

The first three bars show the effect of using different methods to do the matrix inversion:

  • pinv: pseudoinverse from numpy.linalg (uses SVD)
  • inv: multiplicative inverse from numpy.linalg
  • inv(chol): computes the Cholesky decomposition first and then inverts the matrix

The next bar "inv(chol) opt" uses the same inversion as above, but with some additional techincal optimizations.

The last bar "julia" shows the runtime of the julia implementation on 8 cores, with alignment compression.

Alignment compression has not been implemented yet.

About

Python implementation of GaussDCA using Cython.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%