|
1
|
- An alternative strategy for database search and alignment
|
|
2
|
- Faster than quadratic time
- Alignment isn’t always optimal\
- Strategy
- Preprocess 2 sequences to determine the most likely offset between
them. This essentially a local alignment extending the length of the
shorter sequence
- Use that info to define a restricted alignment space and employ a
global alignment within that space
|
|
3
|
- Pick an arbitrary window size, say, 4 nucleic acid bases (4-mer)
- Line up the two sequences and record a score for all possible 4-mer
congruences.
- Slide one sequence one base and repeat
- Continue until overlap is exhausted
- The highest score is the most likely offset
|
|
4
|
- There are 256 4-mers of nucleotides (44). Create a table and use the 4-mer to
index the table
- Slide the 4-wide window along both the query and the target. At each 1
base increment, open up the table cell for the 4-mer in the window and
record
- Whether it came from query or
target
- The increment (ie the starting position of the 4-mer in the string)
|
|
5
|
|
|
6
|
|
|
7
|
- Create a vector of cells, one for
each possible offset between the query and the target. Initialize to 0.
- Walk the table. For every entry with an s, if there is a t or t’s,
compute all the differences (s’s-t’s) and increment the cells in the
score vector indexed by each of those differences.
- Select the highest scoring value in the score vector. Its index ( may be
+ or -) tells what the most
likely global alignment is, with
respect to the main diagonal of the dynamic programming matrix.
|
|
8
|
|
|
9
|
|
|
10
|
|
|
11
|
|
|
12
|
- The implementation of the conceptual idea of sliding would be O(n2)
where n is the length of the longer sequence
- The use of the table lookup in FastA reduces the complexity to O(n) for
an unbiased table
- If the channel around the diagonal for dynamic programming is opened,
then the dynamic programming costs increase accordingly
|