Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

String Matching-Algorithm Design and Analysis for Strings-Lecture Slides, Slides of Design and Analysis of Algorithms

Maulana Abul Kalam Azad University of Technology Design and Analysis of Algorithms

This lecture is part of lecture series for Design and Analysis of Algorithms course. This course was taught by Dr. Bhaskar Sanyal at Maulana Azad National Institute of Technology. It includes: String, Matching, Pattern, Exact, Searching, Keywords, Database, Sunstring, Subsequence, Brute-Force, Algorithm

Typology: Slides

2011/2012

Uploaded on 07/11/2012

dharmadaas 🇮🇳

4.3

(55)

262 documents

1 / 27

This page cannot be seen from the preview

Don't miss anything!

3 -1

String Matching

Docsity.com

Discover Slides of Design and Analysis of Algorithms Maulana Abul Kalam Azad University of Technology

Partial preview of the text

Download String Matching-Algorithm Design and Analysis for Strings-Lecture Slides and more Slides Design and Analysis of Algorithms in PDF only on Docsity!

3 -

String Matching

3 -

String Matching Problem

 Given a text string T of length n and a pattern

string P of length m , the exact string matching

problem is to find all occurrences of P in T.

 Example: T=“AGCTTGA” P=“GCT”

 Applications:

 Searching keywords in a file

 Searching engines (like Google and Openfind)

 Database searching (GenBank)

3 -

A Brute-Force Algorithm

Time: O( mn ) where m =| P | and n =| T |.

3 -

Two-phase Algorithms

 Phase 1：Generate an array to indicate the

moving direction.

 Phase 2：Make use of the array to move and

match the string

 KMP algorithm:

 Proposed by Knuth, Morris and Pratt in 1977.

 Boyer-Moore Algorithm:

 Proposed by Boyer-Moore in 1977.

3 -

Second Case for KMP Algorithm

 The first symbol of P appears in P again.

 T

 P

7 in (a). We have to slide to T 6 , since P 6

= P

= T

3 -

Third Case for KMP Algorithm

 The prefix of P appears in P again.

 T

 P

8 in (a). We have to slide to T 6 , since P 6,

= P

= T

3 -

Definition of the Prefix Function

f ( j ) =k

f ( j )=largest k < j such that P

1, k

=P

j–k +1 ,j

f ( j ) = 0 if no such k

3 -

Calculation of the Prefix Function

determine f ( 5 )

Because , we get ( 5 ) 0

If , then we check if ;

If , then we get ( 5 ) ( 4 ) 1 ;

( 4 ) 1 , thus

5 1

5 2 5 1

5 2

4 1

P P f

P P P P

P P f f

f P P

3 -

Calculation of the Prefix Function

f ( 4 )  1 9 (^91 )^14

f ( 9 ) 4 because P P P

 

( 4 ) 1 because "A"

4 ( 4 1 ) 1 1

 

f P P P

"T"

( 10 ) 2 because "T" "C"

(^10) ( 10 1 ) 1 ( ( 10 1 )) 1 ( 4 ) 1 2

10 ( 10 1 ) 1 5

   ^ 

 

P P P P P

f P P P

f f f^ f

To determine f (10):

Pattern Matching 14

Computing the Failure

Function

 The failure function can be

represented by an array and

can be computed in O ( m ) time

 The construction is similar to

the KMP algorithm itself

 At each iteration of the while-

loop, either

 i increases by one, or

 the shift amount i  j

increases by at least one

(observe that F ( j  1) < j )

 Hence, there are no more

than 2 m iterations of the

while-loop

Algorithm failureFunction ( P )

F [ 0 ]  0

i  1

j  0

while i < m

if P [ i ]  P [ j ]

{we have matched j + 1 chars}

F [ i ]  j + 1

i  i  1

j  j  1

else if j > 0 then

{use failure function to shift P }

j  F [ j  1]

else

F [ i ]  0 { no match }

i  i  1

3 -

An Example for KMP Algorithm

Phase 1

Phase 2

f (4–1)+1= f (3)+1=0+1=

f (12)+1= 4+1=

matched

3 -

Time Complexity of KMP Algorithm

 Time complexity : O ( m + n ) (analysis omitted)

 O ( m ) for computing function f

 O ( n ) for searching P

3 -

 A suffix Tree for S=“ATCACATCATCA”

Suffix Trees

3 -

Properties of a Suffix Tree

 Each tree edge is labeled by a substring of S.

 Each internal node has at least 2 children.

 Each S

( i )

String Matching-Algorithm Design and Analysis for Strings-Lecture Slides, Slides of Design and Analysis of Algorithms

Related documents

Partial preview of the text

Download String Matching-Algorithm Design and Analysis for Strings-Lecture Slides and more Slides Design and Analysis of Algorithms in PDF only on Docsity!

String Matching

String Matching Problem

 Given a text string T of length n and a pattern

string P of length m , the exact string matching

problem is to find all occurrences of P in T.

 Example: T=“AGCTTGA” P=“GCT”

 Applications:

A Brute-Force Algorithm

Time: O( mn ) where m =| P | and n =| T |.

Two-phase Algorithms

 Phase 1：Generate an array to indicate the

moving direction.

 Phase 2：Make use of the array to move and

match the string

 KMP algorithm:

 Boyer-Moore Algorithm:

 T

 P

= P

= T

 T

 P

= P

= T

Definition of the Prefix Function

f ( j )=largest k < j such that P

=P

f ( j ) = 0 if no such k

Because , we get ( 5 ) 0

If , then we check if ;

If , then we get ( 5 ) ( 4 ) 1 ;

( 4 ) 1 , thus

P P f

P P P P

P P f f

f P P

f ( 4 )  1 9 (^91 )^14

f ( 9 ) 4 because P P P

( 4 ) 1 because "A"

f P P P

"T"

( 10 ) 2 because "T" "C"

P P P P P

f P P P

To determine f (10):

Computing the Failure

Function

An Example for KMP Algorithm

 O ( m ) for computing function f

 O ( n ) for searching P

 A suffix Tree for S=“ATCACATCATCA”

Suffix Trees

Properties of a Suffix Tree

 Each tree edge is labeled by a substring of S.

 Each internal node has at least 2 children.

 Each S

has its corresponding labeled path

from root to a leaf, for 1 i  n.

 There are n leaves.

 No edges branching out from the same

internal node can start with the same

character.