Let's say we got two DNA sequences GATTACA
and GCATGCA
and we want to get the optimal alignments of those two:
G-ATTACA | G-ATTACA | G-ATTACA
GCA-TGCG | GCAT-GCG | GCATG-CG
We calculate the alignment score as:
We begin we score 0
- If two letters are the same we add
m
(match) to the score - If two letters are different we subtract
d
(differ) to the score - If we add a gap we subtract
g
(gap) to the score
To find all possible alignments:
- Split the first sequence in half
- Calculate the last line of the Needleman-Wunsch matrix for the first half and the whole second sequence
- Calculate the last line of the Needleman-Wunsch matrix for the reversed second half and the whole reversed second sequence
- Add the elements of
step 2
and reversedstep 3
together - Find the indexes of the elements equal to the max element of
step 4
Then recursively split the first sequence in half and the second one at the index found in step 5
until the length of a sequence is 0 or 1.
The reqursion terminates when you end up with a sequence of length 0 or 1. In the first case add gaps to the allignment. In the second one you have to aplly the Needleman-Wunsch algorithm, but this time the matrix will either have 2 rows or 2 columns
This is an efficient way to reduce space complexity of the Needleman-Wunsch algorithm invented by Dan Hirschberg
- Time complexity: O(mn)
- Space complexity: O(mn)
- Time complexity: O(mn)
- Space complexity: O(min{m, n})