racheliee / algo_PA2

Finds Longest Common Subsequence of 2 to 5 DNA Sequences using dynamic programming

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

algorithm PA2

23-2 Algorithm Project_2

Goal

Algorithm to implement multiple sequence alignment with k DNA sequences using Dynamic Programming.

TODO

  • four-letter alphabet {Adenine (A), Thymine (T), Guanine (G), Cytosine (C)}
  • measure the similarity of genetic sequences by the frequency of the exactly matched alphabets
  • align the k sequences, but we are permitted to insert gaps in either any sequence
  • C programming language to print out the sequence alignment result into the output file named ‘hw2 output.txt after finding the best sequence alignment from k DNA sequences in the input file named ‘hw2 input.txt’

Input file consists of

  • fisrt part : the number (k) of DNA sequences to be aligned
  • second part : the k DNA sequences to be aligned (each sequence appears on a separate line of text)
  • Each part is separated from the next part by a character $
  • 2 ≤ k ≤ 5, and 1 ≤ n ≤ 120 where n is the maximum length of each DNA sequence

Output file consist of

  • the sequence alignment results with marks representing matched alphabets
  • In the last line, mark “*” on the columns containing identical alphabets across all sequences

Example of input / output file

[Input file: hw2_input.txt] 
3
$
ATTGCCATT
ATGGCCATT 
ATCCAAT

[Output file: hw2_output.txt] 
ATTGCCA-TT
ATGGCCA-TT
AT--CCAAT-
**  *** * 

Judgement

  • the number of identical alphabets across all sequences returned by your submitted program
  • the actual running time
  • well-written document to explain your source code and the performance analysis of your algorithm

referece

About

Finds Longest Common Subsequence of 2 to 5 DNA Sequences using dynamic programming


Languages

Language:C 100.0%