|
1
|
|
|
2
|
- Suffix: Given a string A of
length j, with characters represented by A[1] A[2] A[3] . . .
A[j], a suffix of A is a
substring represented by A[i] A[i+1] A[i+2] . . . A[j], where i is an
integer and 1<= i <= j.
|
|
3
|
- Suffix Tree: A tree representation of the suffixes of a string such that
each unique character in the string has a main branch off the root node
of the tree, and there is a one-to-one correspondence between suffixes
of the string and leaves of the suffix tree.
|
|
4
|
|
|
5
|
- Weiner and McCreight separately developed linear-time algorithms for
constructing suffix trees in the 1970s, but the complexity of the
algorithms limited the widespread use of suffix trees.
- In the mid 1990s Ukkonen revisited suffix trees and published a more
straightforward linear-time algorithm for their construction.
- Dan Gusfield, author of Algorithms on Strings, Trees, and Sequences, states
that “We know of no other single data structure (other than those
essentially equivalent to suffix trees) that allows efficient solutions
to such a wide range of complex string problems.”
- Exact String Matching
- Repeating Substrings of a Single String
- Common Substrings of Two or More Strings
|
|
6
|
- We’ll follow Ukkonen’s linear-time algorithm, as described by Mark
Nelson in “Fast String Searching with Suffix Trees” and by Dan Gusfield
in Algorithms on Strings, Trees, and Sequences.
|
|
7
|
|
|
8
|
|
|
9
|
|
|
10
|
|
|
11
|
|
|
12
|
|
|
13
|
|
|
14
|
|
|
15
|
|
|
16
|
|
|
17
|
|
|
18
|
|
|
19
|
|
|
20
|
- Rule 1: If b ends at a leaf, update its label by adding character S(i+1).
|
|
21
|
- Rule 2: If some path from the end of suffix b starts with character S(i+1),
do nothing (other than mark active site of an implicit node).
|
|
22
|
- Rule 3: If no path from the end of suffix b starts with character S(i+1),
but at least one labeled path continues from the end of b, create an
internal explicit node from the existing implicit node and create a new
leaf off the internal node.
|
|
23
|
|
|
24
|
- Edge-Label Compression
- Global index e, updated once in each phase
|
|
25
|
|
|
26
|
|
|
27
|
|
|
28
|
|
|
29
|
- Suffix tree construction done in linear time.
- String matching done in time proportional to length of search string
(not suffix tree to be searched).
- Suffix tree versatile for solving variety of string-matching problems.
|