tzok / mcq-cli

Command line interface to MCQ

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mean of Circular Quantities (MCQ). Command-line tools

Mean of Circular Quantities (MCQ) is a dissimilarity measure useful in comparison of 3D protein and/or RNA structures. It calculates an average difference between corresponding torsion angle values (rotations around bonds). More information can be found in:

Tomasz Zok, Mariusz Popenda, and Marta Szachniuk. 2014. "MCQ4Structures to Compute Similarity of Molecule Structures." Central European Journal of Operations Research 22(3):457–73.

Jakub Wiedemann, Tomasz Zok, Maciej Milostan, and Marta Szachniuk. 2017. "LCS-TA to Identify Similar Fragments in RNA 3D Structures." BMC Bioinformatics 18(1):456.

This project contains command-line tools to compare RNA 3D structures.

Usage

mcq-local

$ ./mcq-local

usage: mcq-local [-a <arg>] [-M <arg>] [-n <arg>] -t <arg> [-T <arg>]

 -a,--angles <arg>             Torsion angle types (separated by comma without
                               space), select from:
                               ALPHA,BETA,GAMMA,DELTA,EPSILON,ZETA,NU0,NU1,NU2
                               ,NU3,NU4,ETA,THETA,ETA_PRIM,THETA_PRIM,CHI,PSEU
                               DOPHASE_PUCKER. Default is:
                               ALPHA,BETA,GAMMA,DELTA,EPSILON,ZETA,CHI
 -M,--selection-model <arg>    Selection query for 3D RNA model
 -n,--names <arg>              Model names to be saved in output files
                               (separated by comma without space)
 -t,--target <arg>             Path to PDB file of the native 3D RNA target
 -T,--selection-target <arg>   Selection query for native 3D RNA target

selection query syntax is CHAIN:RESNUM-ICODE:COUNT e.g. A:1:5 means a
selection of 5 residues starting with residue number 1 in chain A

mcq-local is a tool which compares a set of 3D structures against a reference target. User can select a set of torsion angle types to be considered in the comparison process. Both the target and models can be aligned using selection queries.

The results are:

  • Ranking of models with respect to MCQ value
  • Detailed data about dissimilarity for each residue in every model
  • Visualization of secondary structure with color-coded dissimilarities in each residue for every model
$ ./mcq-local -t /tmp/rp13/13_0_solution_4XW7_rpr.pdb -T A:1:45 -M A:1:45 /tmp/rp13/13_{A,B,D,X}*.pdb

Partial results are available in: /tmp/bbbfa598-4adf-42b5-b457-fd89601fbb59
13_Das_2_rpr.pdb 19.07
13_Bujnicki_5_rpr.pdb 19.27
13_Bujnicki_8_rpr.pdb 19.60
...

$ ls -l /tmp/bbbfa598-4adf-42b5-b457-fd89601fbb59/
drwxr-xr-x 2 tzok tzok   100 03-14 14:43 13_Bujnicki_5_rpr
drwxr-xr-x 2 tzok tzok   100 03-14 14:43 13_Bujnicki_8_rpr
drwxr-xr-x 2 tzok tzok   100 03-14 14:43 13_Das_2_rpr
-rw-r--r-- 1 tzok tzok 39147 03-14 14:43 table.csv
...

Each directory with model name contains visualization and details about angular dissimilarity and MCQ for each residue.

A helper script colorbar.py can be used to produce colorbars from generated table.csv using ./colorbar.py -i table.csv.

mcq-lcs

$ ./mcq-lcs

usage: mcq-lcs [-M <arg>] -t <arg> [-T <arg>] -v <arg>

 -M,--selection-model <arg>       Selection query for 3D RNA model
 -t,--target <arg>                Path to PDB file of the native 3D RNA target
 -T,--selection-target <arg>      Selection query for native 3D RNA target
 -v,--mcq-threshold-value <arg>   Value of MCQ threshold in degrees

selection query syntax is CHAIN:RESNUM-ICODE:COUNT e.g. A:1:5 means a
selection of 5 residues starting with residue number 1 in chain A

mcq-lcs is a tool to find the longest continuous segment which is dissimilar up to a cerain MCQ threshold.

The result is a set of information about found segments alignment: size, coverage and nucleotide positions in the chains.

$ ./mcq-lcs -t /tmp/rp13/13_0_solution_4XW7_rpr.pdb /tmp/rp13/13_Das_2_rpr.pdb -v 15

MCQ value: 15°
Number of residues: 19
Coverage: 31.667% 
Target name: 13_0_solution_4XW7_rpr
First target residue: A.U7
Last target residue: A.G25
Model name: 13_Das_2_rpr
First model residue: A.A27
Last model residue: A.C45

mcq-global

$ ./mcq-global 

usage: mcq-global [-a <arg>] [-M <arg>]

 -a,--angles <arg>            Torsion angle types (separated by comma without
                              space), select from:
                              ALPHA,BETA,GAMMA,DELTA,EPSILON,ZETA,NU0,NU1,NU2,
                              NU3,NU4,ETA,THETA,ETA_PRIM,THETA_PRIM,CHI,PSEUDO
                              PHASE_PUCKER. Default is:
                              ALPHA,BETA,GAMMA,DELTA,EPSILON,ZETA,CHI
 -M,--selection-model <arg>   Selection query for 3D RNA model

selection query syntax is CHAIN:RESNUM-ICODE:COUNT e.g. A:1:5 means a
selection of 5 residues starting with residue number 1 in chain A

mcq-global tool is useful for comparison of multiple 3D structures without a reference one. A single MCQ value is computed for each pair of 3D structures. The selection of torsion angle types to be considered in the comparison can be changed using --angles switch. By default, the alignment of structures is done automatically, but user can provide a selection query to compute just a fragment.

The results are:

  • Distance matrix
  • Clustering (k-medoids)
  • Visualization
$ ./mcq-global /tmp/rp13/*.pdb

Comparing... 100% │████████████████████████████████│ 1081/1081 (0:00:09 / 0:00:00)
Results available in: /tmp/f3610dda-feec-460e-9cd4-ecf1c1b8466c

$ ls -l /tmp/f3610dda-feec-460e-9cd4-ecf1c1b8466c/
-rw-r--r-- 1 tzok tzok  5614 03-14 14:31 clustering.svg
-rw-r--r-- 1 tzok tzok   981 03-14 14:31 clustering.txt
-rw-r--r-- 1 tzok tzok 41926 03-14 14:31 matrix.csv

mcq-print

$ ./mcq-print

usage: mcq-print [-T <arg>]

 -T,--selection-target <arg>   Selection query for native 3D RNA target

selection query syntax is CHAIN:RESNUM-ICODE:COUNT e.g. A:1:5 means a
selection of 5 residues starting with residue number 1 in chain A

mcq-print is a utility tool to print out values of torsion angles in a TSV format.

$ ./mcq-print /tmp/rp13/13_0_solution_4XW7_rpr.pdb

Chain   ResNum  iCode   Name    alpha   beta        gamma   delta   epsilon     zeta    chi
A       1       -       G       -       -164.167    58.239  77.868  -153.691    -67.832 -170.106
A       2       -       G       -74.196 -178.465    57.158  79.161  -148.81     -69.172 -168.967
A       3       -       G       -63.156  173.711    54.802  76.679  -144.103    -80.547 -165.204
...

About

Command line interface to MCQ


Languages

Language:Java 89.0%Language:Python 6.2%Language:R 3.5%Language:Shell 1.3%