ugur-koc / Neigbor-Joining

Computational Genomics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This file contains brief explanations for files in this repository. For more explanation on how to run a script look at comment block on top of each file


To compile the project:

> make

To run the neighbor-joining algorithm:

> ./main <INPUT> <OUTPUT> <VERBOSE>

where

INPUT:    file containing protein names and a distance matrix (see data/*.dist files to the format)
OUTPUT:   file to store the computed phylogenetic tree
VERBOSE:  type "verbose" to print debugging logs



Source Files:
---------------------aligntodist_pw.pl-------------------
This script creates distance matrixes using pairwise global alignment scores

---------------------pairwisealignment.sh----------------
This script creates pair alignments

---------------------aligntodist.pl----------------------
This script creates distance matrixes using distmat program from emboss package

---------------------fasta_parser.sh---------------------
This is a helper script used to create .fasta files for each protein sequence 

---------------------experiments.pl----------------------
This script runs the experiment conducted in this project. It takes one argument; 1, 2, or 3
# input: 1,2,3; 1 to run distmat expriment,
#               2 to run pairwise global score experiment,
#               and 3 to run protdist experiment

---------------------compareTrees.py----------------------
This converts output tree of our NJ implementation into newick format
# and also generates the accuracy score for given two phylogenetic trees

---------------------makefile---------------------
Creates executables

---------------------joiner.h---------------------
Header file of joiner.cpp

---------------------joiner.cpp---------------------
This file contains main Neighbor-Joining algorithm

---------------------main.cpp---------------------
Main source file of Neighbor-Joining implementation

---------------------protdist.c---------------------
This is the modified version of phylib's protdist program. 
It is only used in experiment.pl script.

---------------------phylip.c---------------------
A library from phylib

---------------------phylip.h---------------------
A header from phylib
--------------------------------------------------
Binaries:
---------------------main-------------------------
main.c executable

---------------------protdist---------------------
protdist. c executable

Other files:
---------------------infile-----------------------
It is the same as 85VAST.aln but only in a different format. it is used in protdist experiment.

---------------------README-----------------------
This file

Directories:
---------------------data-------------------------
Contains all data used in and produced by this project

---------------------report-----------------------
Contains the source of final report


Dependicies:
Inorder to run the experiment of this project following software should be installed:
emboss, phylib
python libraries: ete2, networkx

perl libraries: Time::HiRes

NOTE:  there are some hard coded directory path in the code!
They are generally mentioned in the comments on top of each file.

About

Computational Genomics


Languages

Language:C 78.8%Language:TeX 11.1%Language:C++ 3.9%Language:Perl 3.9%Language:Python 1.6%Language:Shell 0.7%Language:Makefile 0.1%