DFA for String Matching: Construction and Correctness, Study notes of Algorithms and Programming

The construction and correctness of a deterministic finite automaton (dfa) for string matching. The dfa is used to identify the presence of a given pattern in a text string. The general construction scheme, including the final state function and suffix function, as well as the dfa-based matching algorithm. The time and space complexity of the algorithm are also discussed.

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-m2w
koofers-user-m2w ๐Ÿ‡บ๐Ÿ‡ธ

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
COT 5405: Fall 2006
Lecture 23
DFA for String matching
Finite Automaton
1. Set of states, Q.
2. Start state q ๎˜‚ Q.
3. Set of accepting states, A
๎˜‰
Q.
4. Alphabet,
๎˜…
.
5. Transition function,
๎˜†
: Q
๎˜
๎˜…
๎˜‚
Q.
General Construction Scheme
Final state function:
๎˜ˆ
(w) is the state after scanning w.
โ€ข
๎˜ˆ
(
๎˜Š
) = q0.
โ€ข
๎˜ˆ
(wa) =
๎˜†
(
๎˜ˆ
(w), a), w
๎˜ƒ
๎˜…
*, a
๎˜ƒ
๎˜…
.
Suffix function:
๎˜‡
(x) = max{k: P[1
โ€ฆ
k] is a suffix of x}.
โ€ข
๎˜‡
(x) is the length of the longest prefix of P that is also a suffix of x.
โ€ข P0 =
๎˜Š
is a suffix of all strings.
Construction: Q = {0, 1,
โ€ฆ
, m}, q0 = 0, A = {m},
๎˜†
(q, a) =
๎˜‡
(Pqa).
โ€ข Note:
๎˜‡
(x) = m iff P is a suffix of x, implying that a match has been found.
DFA-based Matching
FA-Matcher(T, ๎˜ƒ, m)
โ€ข q ๎˜ 0
โ€ข for i = 1 to n
o q ๎˜ ๎˜ƒ(q, T[i])
o if q == m
๎˜ Print i โ€“ m
This takes
๎˜„
(n) time and
๎˜„
(m |
๎˜…
|) space.
pf2

Partial preview of the text

Download DFA for String Matching: Construction and Correctness and more Study notes Algorithms and Programming in PDF only on Docsity!

COT 5405: Fall 2006

Lecture 23

DFA for String matching

Finite Automaton

  1. Set of states, Q.
  2. Start state q  Q.
  3. Set of accepting states, A Q.
  4. Alphabet, .
  5. Transition function,  : Q    Q.

General Construction Scheme

Final state function :  (w) is the state after scanning w.

  •  ( ) = q 0.
  •  (wa) =  (  (w), a), w   *, a  .

Suffix function :  (x) = max {k: P[1 โ€ฆ k ] is a suffix of x}.

  •  (x) is the length of the longest prefix of P that is also a suffix of x.
  • P 0 = is a suffix of all strings.

Construction: Q = {0, 1, โ€ฆ , m}, q 0 = 0, A = {m},  (q, a) =  (P (^) q a).

  • Note:  (x) = m iff P is a suffix of x , implying that a match has been found.

DFA-based Matching

FA-Matcher(T, , m)

  • q  0
  • for i = 1 to n o q  (q, T[i]) o if q == m  Print i โ€“ m

This takes  (n) time and  (m |  |) space.

Correctness of Construction

We wish to prove that the state is (T (^) i) after scanning T[1 โ€ฆ i]. That is, we wish to prove that ( T (^) i) = (T (^) i).

Theorem 32.4: ( T (^) i) = (T (^) i), i = 0, โ€ฆ, n. Proof: We prove the theorem by induction on i.

Base case: ( T 0 ) = 0 = (T 0 ). Induction hypothesis: Assume (T (^) i) = (T (^) i).

We wish to prove that ( T (^) i+1 ) = (T (^) i+1).

( T (^) i+1 ) = ( T (^) i T[i+1]) = ( ( T (^) i), T[i+1]) (from the definition of ) = (P (Ti) T[i+1]) (from the definition of ) = (P  (Ti) T[i+1]) (from the induction hypothesis) = (T (^) i T[i+1]) (from lemma 32.3) = (T (^) i+1). Q.E.D.

Constructing 

  • for q = 0 to m (m) time o for each a   (||) time  k  m+  Repeat k  k-1 O(m) time - until P (^) k is a suffix of P (^) qa O(m) time  (q, a)  k

This takes O( m^3 | |) time. This can be improved to O( m | |).