Notes
Slide Show
Outline
1
FastA
  • An alternative strategy for database search and alignment
2
FastA
  • Faster than quadratic time
  • Alignment isn’t always optimal
  • Strategy
    • Preprocess 2 sequences to determine the most likely offset between them
    • Use that info to restrict the area of the matrix used for dynamic programming
3
FastA
Preprocessing step-CONCEPTUAL
Runs in QUADRATIC time
  • Pick an arbitrary window size, say, 4 nucleic acid bases (4-mer)
  • Line up the two sequences and record a score for all possible 4-mer congruences.
  • Slide one sequence one base and repeat
  • Continue until overlap is exhausted
  • The highest score is the most likely offset
4
FastA
Preprocessing step-ACTUAL
  • There are 256 4-mers of nucleotides (44).  Create a table and use the 4-mer to index the table
  • Slide the 4-wide window along both the query and the target. At each 1 base increment, open up the table cell for the 4-mer in the window and record
    • Whether  it came from query or target
    • The increment (ie the starting position of the 4-mer in the string)
5
 
6
FastA
Preprocessing step-ACTUAL
  • Create a vector of cells,  one for each possible offset between the query and the target.   Initialize to 0.
  • Walk the table. For every entry with an s, if there is a t or t’s, compute all the differences (s’s-t’s) and increment the cells in the score vector indexed by each of those differences.
  • Select the highest scoring value in the score vector. Its index ( may be + or -)  tells what the most likely global alignment is,  with respect to the main diagonal of the dynamic programming matrix.


7
 
8
FastA
Processing step
9
FastA
Processing step
  • The cells that are shown grey here have infinitely negative values.  This forms a ‘channel’ around the offset diagonal. No dynamic programming can proceed. If there are to be no gaps, we are done.
10
FastA
Processing step
  • If we widen the channel, then the dynamic programming can operate around the cells of the offset diagonal, inserting gaps as needed.  The wider, the box, the more dynamic programming
11
FastA
Complexity
  • The implementation of the conceptual idea of sliding would be O(n2) where n is the length of the longer sequence
  • The use of the table lookup in FastA reduces the complexity to O(n) for an unbiased table
  • If the channel around the diagonal for dynamic programming is opened, then the dynamic programming costs increase accordingly