Notes
Slide Show
Outline
1
SUFFIX TREES
  • AN  INTRODUCTION
2
Definitions
  • Suffix: Given a string A of  length j, with characters represented by A[1] A[2] A[3] . . . A[j],  a suffix of A is a substring represented by A[i] A[i+1] A[i+2] . . . A[j], where i is an integer and 1<= i <= j.
3
Definitions, cont.
  • Suffix Tree: A tree representation of the suffixes of a string such that each unique character in the string has a main branch off the root node of the tree, and there is a one-to-one correspondence between suffixes of the string and leaves of the suffix tree.
4
 
5
Background
  • Weiner and McCreight separately developed linear-time algorithms for constructing suffix trees in the 1970s, but the complexity of the algorithms limited the widespread use of suffix trees.
  • In the mid 1990s Ukkonen revisited suffix trees and published a more straightforward linear-time algorithm for their construction.
  • Dan Gusfield, author of Algorithms on Strings, Trees, and Sequences, states that “We know of no other single data structure (other than those essentially equivalent to suffix trees) that allows efficient solutions to such a wide range of complex string problems.”


    • Exact String Matching
    • Repeating Substrings of a Single String
    • Common Substrings of Two or More Strings

6
Suffix Tree Construction
  • We’ll follow Ukkonen’s linear-time algorithm, as described by Mark Nelson in “Fast String Searching with Suffix Trees” and by Dan Gusfield in Algorithms on Strings, Trees, and Sequences.
7
 
8
 
9
 
10
 
11
 
12
 
13
 
14
 
15
 
16
 
17
 
18
 
19
 
20
Suffix Extension Rules                (For b a suffix of S[1…i])
  • Rule 1: If b ends at a leaf, update its label by adding character S(i+1).
21
Suffix Extension Rules                (For b a suffix of S[1…i])
  • Rule 2: If some path from the end of suffix b starts with character S(i+1), do nothing (other than mark active site of an implicit node).
22
Suffix Extension Rules                (For b a suffix of S[1…i])
  • Rule 3: If no path from the end of suffix b starts with character S(i+1), but at least one labeled path continues from the end of b, create an internal explicit node from the existing implicit node and create a new leaf off the internal node.
23
Implementation tricks
  • Suffix Links
24
Implementation Tricks, cont.
  • Edge-Label Compression
  • Global index e, updated once in each phase
25
Some Sample Uses
of Suffix Trees
26
 
27
 
28
 
29
In Conclusion
  • Suffix tree construction done in linear time.
  • String matching done in time proportional to length of search string (not suffix tree to be searched).
  • Suffix tree versatile for solving variety of string-matching problems.