Indexed Search Trees (Tries): Types, Efficiency, and Applications - Prof. Fawzi Philip Ema, Study notes of Computer Science

An overview of indexed search trees (tries), a data structure used for searching strings. The authors discuss the concept of tries, its advantages, and different types such as standard, compressed, and compact tries. They also provide examples and applications in areas like morse code, web search engines, and computational biology.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-71p-1
koofers-user-71p-1 🇺🇸

9 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Indexed Search Tree (Trie)
Fawzi Emad
Chau-Wen Tseng
Department of Computer Science
University of Maryland, College Park
Indexed Search Tree (Trie)
Special case of tree
Applicable when
Key Ccan be decomposed into a sequence of
subkeys C1, C2, … Cn
Redundancy exists between subkeys
Approach
Store subkey at each node
Path through trie yields full key
Example
Huffman tree
C3
C1
C2
C4
C3
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Indexed Search Trees (Tries): Types, Efficiency, and Applications - Prof. Fawzi Philip Ema and more Study notes Computer Science in PDF only on Docsity!

Indexed Search Tree (Trie)

Fawzi Emad

Chau-Wen Tseng

Department of Computer Science

University of Maryland, College Park

Indexed Search Tree ( Trie)

Special case of tree

Applicable when

Key C can be decomposed into a sequence of subkeys C 1 , C 2 , … Cn Redundancy exists between subkeys

Approach

Store subkey at each node Path through trie yields full key

Example

Huffman tree

C 3

C 1

C 2

C 3 C 4

Tries

Useful for searching strings

String decomposes into sequence of letters Example “ART”“A” “R” “T”

Can be very fast

Less overhead than hashing

May reduce memory

Exploiting redundancy

May require more memory

Explicitly storing substrings

S

A

R

E T

“ART”

Types of Tries

Standard

Single character per node

Compressed

Eliminating chains of nodes

Compact

Stores indices into original string(s)

Suffix

Stores all suffixes of string

Standard Trie Example

For strings

{ bear, bell, bid, bull, buy, sell, stock, stop }

a

e

b

r

l

l

s

u

l

l

y

e t

l

l

o

c

k

p

i

d

Standard Tries

Node structure

Value between 1…m Reference to m children Array or linked list

Example

Class Node { Letter value; // Letter V = { V 1 , V 2 , … Vm } Node child[ m ]; }

Standard Tries

Efficiency

Uses O(n) space Supports search / insert / delete in O(d × m) time For n total size of strings indexed by trie d length of the parameter string m size of the alphabet

Word Matching Trie

Insert words into trie

Each leaf stores occurrences of word in the text

s e e b e a r? s e l l s t o c k! s e e b u l l? b u y s t o c k! b i d s t o c k!

a a

h e t h e b e l l? s t o p!

b i d s t o c k!

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86

a r 87 88

a

e

b

l

s u l

e t e 0, 24

o c

i l r 6

l 78

d 47, 58 (^) l 30

y 36 l (^12) k 17, 40, 51, 62

p 84

h e

r 69

a

Compact Tries

Compact representation of a compressed trie

Approach

For an array of strings S = S[0], … S[s-1] Store ranges of indices at each node Instead of substring Represent as a triplet of integers (i, j, k) Such that X = s[i][j..k] Example: S[0] = “abcd”, (0,1,2) = “bc”

Properties

Uses O(s) space, where s = # of strings in the array Serves as an auxiliary index structure

Compact Representation

Example

s e e

b e a r

s e l l

s t o c k

b u l l

b u y

b i d

h e

b e l l

s t o p

0 1 2 3 4

S [0] = a r

S [1] = S [2] = S [3] =

S [4] = S [5] = S [6] =

S [7] = S [8] = S [9] =

0 1 2 3 0 1 2 3

1, 1, 1

1, 0, 0 (^) 0, 0, 0

4, 1, 1

0, 2, 2

3, 1, 2

1, 2, 3 8, 2, 3

6, 1, 2

4, 2, 3 5, 2, 2 2, 2, 3 3, 3, 4 9, 3, 3

7, 0, 3

0, 1, 1

Suffix Trie

Compressed trie of all suffixes of text

Example: “IPDPS”

Suffixes IPDPS PDPS DPS PS S

Useful for finding pattern in any part of text

Occurrenceprefix of some suffix Example: find PDP in IPDPS

D

P

S

P

I

P

D

S

P

S

D

P S

S

Suffix Trie

Properties

For String X with length n Alphabet of size m Pattern P with length d Uses O(n) space Can be constructed in O(n) time Find pattern P in X in O(d × m) time Proportional to length of pattern, not text

Computational Biology

DNA

Sequence of 4 different nucleotides (ATCG) Portions of DNA sequence produce proteins (genes)

Genome

Master DNA sequence for organism For Human 46 chromosomes 3 billion nucleotides

Tries and Computational Biology

ESTs

Fragments of expressed DNA Indicator for genes (& location) 5.5 million sequences at NIH

ESTmapper

Build suffix trie of genome 8 hours, 60 Gbytes Search for ESTs in suffix trie 11 hours w/ 8 processor Sun

Search genome w/ BLAST

5 +^ years (predicted)

Genome

ESTs

Suffix tree

Mapping

Gene