xinqyao / dbnn

Code related to my publication, Yao, XQ., Zhu, H. & She, ZS. A dynamic Bayesian network approach to protein secondary structure prediction. BMC Bioinformatics 9, 49 (2008).https://doi.org/10.1186/1471-2105-9-49

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

INSTALLATION
------------

# Requirements #
1. Operating system:
 
  *   LINUX / UNIX

2. Requisite softwares:

  *   MATLAB (v6.5 or later)
  *   Bayes Net Toolbox (BNT);
      can be downloaded from http://bnt.sourceforge.net/

3. Optional software:

  *  NCBI BLAST suite; 
     can be downloaded from http://www.ncbi.nlm.nih.gov/

NOTE: 
   
   If you have not installed BLAST 
   locally, the way to use DBNN is: 
   running BLAST and constructing 
   your own PSSM file somewhere else 
   (e.g.  online BLAST), and then 
   provide the PSSM file to DBNN.
     
# Compilation #
Type the following commands in the home 
directory of DBNN to compile and install
the package:

  make
  vi rundbnn
  ( modify the first FOUR variables 
    in the rundbnn file, i.e. homedir, matlabdir,
    blastdir, and dbname, according to 
    your own configurations and exit )
  ./rundbnn # the first run to check 
            # the installation of requisite 
            # softwares 

If you want to launch DBNN in directories other
than the home directory of DBNN, you need to
place (or make a symbol link of) the rundbnn 
program in a directory under the search path 
of the system, and add the home directory
of DBNN into the search path of MATLAB. 



RUN
---------
 
# Quick start #
The simplest way to use DBNN is typing:

   ./rundbnn seq.fasta output

where seq.fasta is your amino acid 
sequences saved in FASTA format,
and output is the prefix of prediction 
files (typically, two prediction 
files will be generated: 

      output.fasta 
and 
      output.raw 

the former contains the predictions 
in FASTA format, and the latter 
contains scores of secondary structures
for each residue site). An additional
file generated by launching rundbnn 
is 
      query.pssm

which is the PSSM file corresponding 
to your seq.fasta. 


By default, rundbnn reads parameters 
from eight files associated with DBNN
package: 

   default.M1.mat
   default.M2.mat
   default.M3.mat
   default.M4.mat
   default.nn1.lin
   default.nn1.sig
   default.nn2.lin
   default.nn2.sig

You can re-train the DBNN (see below) 
and provide your own parameter files 
when launching rundbnn. For example, 
your parameter files are named with
prefix "newpara" (i.e. newpara.M1.mat,
... newpara.nn1.lin, ...),
you can give them to rundbnn by typing:

  ./rundbnn seq.fasta output newpara


If you have not installed BLAST, and 
instead you have run BLAST and constructed
the PSSM file somewhere else (e.g. online), 
you can use DBNN by typing: 

   ./rundbnn yourPSSM output [ prefix-of-parameter-files ]


# Run DBN and NN separately #
1. Run DBN
type:
   
   ./dbnpred PSSM output [ prefix-of-parameter-files ]

In using dbnpred, you should always 
provide the PSSM file and the prefix 
of the parameter files explicitly. 
Two files will be generated: 

   output.fasta 
   output.raw

where output.fasta contains the 
predicted secondary structure 
saved in FASTA format, and 
output.raw contains the predicted 
posterior probabilities distribution 
for each residue.

2. Run NN
type:
  
  ./nnpred PSSM output parameters

Also two files, output.fasta and 
output.raw will be generated, and 
this time output.raw contains the 
raw outputs of the neural network.


# Training of DBN and NN #
1. Training of DBN
type:

  ./dbntrain PSSM secstr.fasta parameters

where secstr.fasta is the annotation 
of secondary structure for your proteins 
(in FASTA format) and "parameters" is a 
user-defined prefix for parameter files 
that will be generated. Four files will 
be generated:

   parameters.M1.mat
   parameters.M2.mat
   parameters.M3.mat
   parameters.M4.mat


2. Training of NN
type:

   ./nntrain PSSM secstr.fasta parameters

Four files will be generated:
   
   parameters.nn1.lin
   parameters.nn1.sig
   parameters.nn2.lin
   parameters.nn2.sig



=================
Xin-Qiu Yao
Jan. 7, 2007

About

Code related to my publication, Yao, XQ., Zhu, H. & She, ZS. A dynamic Bayesian network approach to protein secondary structure prediction. BMC Bioinformatics 9, 49 (2008).https://doi.org/10.1186/1471-2105-9-49


Languages

Language:Standard ML 78.7%Language:C 15.2%Language:MATLAB 5.8%Language:Makefile 0.3%