String Matching - Computer Engineering - Lecture Slides, Slides of Computer Science

Some concept of Computer Engineering are Binary Search, Byzantine Generals, Euclid Sequences, Houses and Utilities, Malfunction Diagnosis. Main points of this lecture are: String Matching, Word Search Puzzles, Problems, Computer, Word Search, Binary Search, Byzantine Generals, Cryptography, Easy Hard Impossible, Malfunction Diagnosis

Typology: Slides

2012/2013

Uploaded on 04/30/2013

naji
naji 🇮🇳

4.3

(6)

87 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
String Matching
Ten Puzzling Problems in Computer Engineering
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download String Matching - Computer Engineering - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

String Matching

Ten Puzzling Problems in Computer Engineering

Word Search Puzzles

Type 1, With Word List Supplied

AGITATOR ASSEMBLY CLUTCH CONNECTORS CONTROL COUPLING

GLIDE LINT SCREEN PULLEY SEAL SWITCH VALVE

AMY STEEL KEVIN BLAIR RON PALILLO BARBARA BINGHAM KIRSTEN BAKER SHAVAR ROSS BRUCE MAHLER LARRY ZERNER STU CHARNO

CAROL LACATELL MARK NELSON SUSAN BLU DANA KIMMELL PAUL KRATKA TONY GOLDWYN JOHN FUREY RICHARD YOUNG TRACIE SAVAGE

The puzzle below is a little harder than the normal word search: one of the 36 first/last names has been left out (which one?)

Word Search Puzzle

Type 2, With Clues Supplied for the Words to be Found

L E A G L E U R O K R D

P O X W Y A R D R X E O

I E O T H Y R O I D T L

H N S N T E T P B N E L

A J C O Z S L M O I M A

W Z O H C J N M I U N R

K F J E R S E Y E L N B

V E G R E T X Z J T E D

Seven birds

Five units of length

Four currencies

Two things football players wear

Large gland in the neck

USA Today’s “Word Roundup” for May 16, 2007: http://puzzles.usatoday.com/

LEAGLEUROKRDPOXWYARDRXEOIEOTHYROIDTLHNSNTETPBNEL

AJCOZSLMOIMAWZOHCJNMIUNRKFJERSEYELNBVEGRETXZJTED

00 12 24 36

48 60 72 84

A Challenging Hybrid Word Search Puzzle

String Matching: Problem Definition

Given a data string with n symbols and a pattern string with m symbols:

  1. Does the pattern string appear in the data string?
  2. What are the locations of all occurrences of the pattern in the data?

LEAGLEUROKRDPOXWYARDRXEOIEOTHYROIDTLHNSNTETPBNEL

AJCOZSLMOIMAWZOHCJNMIUNRKFJERSEYELNBVEGRETXZJTED

00 12 24 36

48 60 72 84

Data string of length n = 96 symbols

Pattern string of length m = 5 symbols: EAGLE

EAGLE

The brute-force, or sliding window, algorithm Consider all possible positions where the pattern might begin ( nm + 1) For each start position, do up to m comparison to see if there is a match

EAGLE

Worst-case complexity = O( mn ); e.g., pattern “aaaaa”, data “aaaaaaaaaa”

EAGLE

Converting 2D Search Puzzles to 1D Searches

A 2D word search puzzle looks more exotic but it can be readily converted to a 1D string search puzzle

L E A G L E U R O E X T R A X W Y A R D R X E O I E O T H Y R O I D T L H N S N T E T P B N E L A J C O Z S L M O I M A W Z O H C J N M I U N R K F J E R S E Y E L N B V E G R E T X Z J T E D

LEAGLEUROEXTRAXWYARDRXEOIEOTHYROIDTLHNSNTETPBNEL AJCOZSLMOIMAWZOHCJNMIUNRKFJERSEYELNBVEGRETXZJTED

LEAGLEUROEXT#RAXWYARDRXEO#IEOTHYROIDTL#HNSNTETPBNEL# AJCOZSLMOIMA#WZOHCJNMIUNR#KFJERSEYELNB#VEGRETXZJTED#

Row-major order

Insert a special symbol (#) between rows to ensure that new words or patterns are not created by the expansion

Column-major order Similarly for (anti)diagonal

Needle in a Haystack: Internet Search

Search for the 10-symbol string “ h e l e n h u n t ”

2.1M hits 3 years ago 5.4 +^ M hits in mid 2012

Needle in a Haystack: Doing Less Work

For a particular pattern and unpredictable data strings, preprocess the pattern so that searching for it in various data strings becomes faster

For a particular data string and unpredictable patterns, preprocess the data string so that when a pattern is supplied, we can readily find it with much less work

Analogy: Magnetize the needle

Analogy: Do a thorough search of the haystack for different types of needles and place markers to guide future searches

Example of Preprocessing the Data String

Devise an efficient method for finding various patterns in the data string: a b c b b b a b c b a b b c a a b c b a b c b a b c b b 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

a a b 14 a b b 10 a b c 0, 6, 15, 19, 23 b a b 5, 9, 18, 22 b b a 4 b b b 3 b b c 11 b c a 12 b c b 1, 7, 16, 20, 24 c a a 13 c b a 8, 17, 21 c b b 2, 25

Find all occurrences of the pattern “ abcbab

a b c 0, 6, 15, 19, 23 b c b 1, 7, 16, 20, 24 c b a 8, 17, 21 b a b 5, 9, 18, 22

a b c 0, 6, 15, 19, 23 b c b 1, 7, 16, 20, 24 c b a 8, 17, 21 b a b 5, 9, 18, 22

Alternate strategy: Focus on the locations of a b c and b a b

Search Engine Indexes

17.5B hits 3 years ago 25.3B hits in mid 2012

667 hits a few years ago 1M+ hits in mid 2012

Xmat