






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of indexed search trees (tries), a data structure used for searching strings. The authors discuss the concept of tries, its advantages, and different types such as standard, compressed, and compact tries. They also provide examples and applications in areas like morse code, web search engines, and computational biology.
Typology: Study notes
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Key C can be decomposed into a sequence of subkeys C 1 , C 2 , … Cn Redundancy exists between subkeys
Store subkey at each node Path through trie yields full key
Huffman tree
Tries
String decomposes into sequence of letters Example “ART” ⇒ “A” “R” “T”
Less overhead than hashing
Exploiting redundancy
Explicitly storing substrings
Types of Tries
Single character per node
Eliminating chains of nodes
Stores indices into original string(s)
Stores all suffixes of string
Standard Trie Example
{ bear, bell, bid, bull, buy, sell, stock, stop }
a
e
b
r
l
l
s
u
l
l
y
e t
l
l
o
c
k
p
i
d
Standard Tries
Value between 1…m Reference to m children Array or linked list
Class Node { Letter value; // Letter V = { V 1 , V 2 , … Vm } Node child[ m ]; }
Standard Tries
Uses O(n) space Supports search / insert / delete in O(d × m) time For n total size of strings indexed by trie d length of the parameter string m size of the alphabet
Word Matching Trie
Insert words into trie
Each leaf stores occurrences of word in the text
s e e b e a r? s e l l s t o c k! s e e b u l l? b u y s t o c k! b i d s t o c k!
a a
h e t h e b e l l? s t o p!
b i d s t o c k!
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
a r 87 88
a
e
b
l
s u l
e t e 0, 24
o c
i l r 6
l 78
d 47, 58 (^) l 30
y 36 l (^12) k 17, 40, 51, 62
p 84
h e
r 69
a
Compact Tries
For an array of strings S = S[0], … S[s-1] Store ranges of indices at each node Instead of substring Represent as a triplet of integers (i, j, k) Such that X = s[i][j..k] Example: S[0] = “abcd”, (0,1,2) = “bc”
Uses O(s) space, where s = # of strings in the array Serves as an auxiliary index structure
Compact Representation
0 1 2 3 4
S [1] = S [2] = S [3] =
S [4] = S [5] = S [6] =
S [7] = S [8] = S [9] =
0 1 2 3 0 1 2 3
1, 1, 1
1, 0, 0 (^) 0, 0, 0
4, 1, 1
0, 2, 2
3, 1, 2
1, 2, 3 8, 2, 3
6, 1, 2
4, 2, 3 5, 2, 2 2, 2, 3 3, 3, 4 9, 3, 3
7, 0, 3
0, 1, 1
Suffix Trie
Suffixes IPDPS PDPS DPS PS S
Occurrence ⇒ prefix of some suffix Example: find PDP in IPDPS
D
P
S
P
I
P
D
S
P
S
D
P S
S
Suffix Trie
For String X with length n Alphabet of size m Pattern P with length d Uses O(n) space Can be constructed in O(n) time Find pattern P in X in O(d × m) time Proportional to length of pattern, not text
Computational Biology
Sequence of 4 different nucleotides (ATCG) Portions of DNA sequence produce proteins (genes)
Master DNA sequence for organism For Human 46 chromosomes 3 billion nucleotides
Tries and Computational Biology
Fragments of expressed DNA Indicator for genes (& location) 5.5 million sequences at NIH
Build suffix trie of genome 8 hours, 60 Gbytes Search for ESTs in suffix trie 11 hours w/ 8 processor Sun
5 +^ years (predicted)
Genome
ESTs
Suffix tree
Mapping
Gene