Notes
Slide Show
Outline
1
FastA
  • An alternative strategy for database search and alignment
2
FastA
  • Faster than quadratic time
  • Alignment isn’t always optimal\


  • Strategy


    • Preprocess 2 sequences to determine the most likely offset between them. This essentially a local alignment extending the length of the shorter sequence


    • Use that info to define a restricted alignment space and employ a global alignment within that space
3
FastA
Preprocessing step-CONCEPTUAL
Runs in QUADRATIC time
  • Pick an arbitrary window size, say, 4 nucleic acid bases (4-mer)
  • Line up the two sequences and record a score for all possible 4-mer congruences.
  • Slide one sequence one base and repeat
  • Continue until overlap is exhausted
  • The highest score is the most likely offset
4
FastA
Preprocessing step-ACTUAL
  • There are 256 4-mers of nucleotides (44).  Create a table and use the 4-mer to index the table
  • Slide the 4-wide window along both the query and the target. At each 1 base increment, open up the table cell for the 4-mer in the window and record
    • Whether  it came from query or target
    • The increment (ie the starting position of the 4-mer in the string)
5
 
6
 
7
FastA
Preprocessing step-ACTUAL
  • Create a vector of cells,  one for each possible offset between the query and the target.   Initialize to 0.
  • Walk the table. For every entry with an s, if there is a t or t’s, compute all the differences (s’s-t’s) and increment the cells in the score vector indexed by each of those differences.
  • Select the highest scoring value in the score vector. Its index ( may be + or -)  tells what the most likely global alignment is,  with respect to the main diagonal of the dynamic programming matrix.


8
 
9
FastA
Processing step
10
 
11
 
12
FastA
Complexity
  • The implementation of the conceptual idea of sliding would be O(n2) where n is the length of the longer sequence
  • The use of the table lookup in FastA reduces the complexity to O(n) for an unbiased table
  • If the channel around the diagonal for dynamic programming is opened, then the dynamic programming costs increase accordingly