GaussDCA (Cython)
Python implementation of GaussDCA using Cython. Adapted from here.
For the original paper please refer to "Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners" by Carlo Baldassi, Marco Zamparo, Christoph Feinauer, Andrea Procaccini, Riccardo Zecchina, Martin Weigt and Andrea Pagnani, (2014) PLoS ONE 9(3): e92721.
This version implements what is called the "slow fallback" in the original Julia implementation.
Installation
Runs in Python 3.6
- Make sure all dependencies are installed:
pip install -r requirements.txt
- Compile the cython source code:
cd src; python setup.py build_ext -i; cd ..
Usage
python gaussdca/gaussdca.py [-h] [-o OUTPUT] [-t THREADS] alignment_file alignment_format
So far, the alignment format needs to be specified using one of ConKit's data formats. The output will be printed or saved into a file if given. The number of threads for multiprocessing can be specified.
Performance
The following chart shows the elapsed runtime in minutes for a large test alignment (test/large.a3m) using 8 cores.
The first three bars show the effect of using different methods to do the matrix inversion:
- pinv: pseudoinverse from numpy.linalg (uses SVD)
- inv: multiplicative inverse from numpy.linalg
- inv(chol): computes the Cholesky decomposition first and then inverts the matrix
The next bar "inv(chol) opt" uses the same inversion as above, but with some additional techincal optimizations.
The last bar "julia" shows the runtime of the julia implementation on 8 cores, with alignment compression.
Alignment compression has not been implemented yet.