Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Structures and Algorithm Analysis - Lecture II, Study notes of Algorithms and Programming

George Mason University (GMU)Algorithms and Programming

The outline of a lecture in the data structures and algorithm analysis course at george mason university. The lecture covers count sorts, string matching, hashing, and b-trees. It includes detailed explanations of count sorts, horspool’s algorithm for string matching, the basics of hashing, collisions, open hashing, closed hashing with linear probing, and storing data on disk.

Typology: Study notes

Pre 2010

Uploaded on 02/10/2009

koofers-user-kd6-1 🇺🇸

10 documents

1 / 25

This page cannot be seen from the preview

Don't miss anything!

Outline Introduction Count Sorts String Matching Hashing B-Trees Homework

CS 483 - Data Structures and Algorithm Analysis

Lecture VII: Chapter 7

R. Paul Wiegand

George Mason University, Department of Computer Science

March 29, 2006

R. Paul Wiegand George Mason University, Department of Computer Science

CS483 Lecture II

Discover Study notes of Algorithms and Programming George Mason University (GMU)

Partial preview of the text

Download Data Structures and Algorithm Analysis - Lecture II and more Study notes Algorithms and Programming in PDF only on Docsity!

CS 483 - Data Structures and Algorithm Analysis

Lecture VII: Chapter 7

R. Paul Wiegand

George Mason University, Department of Computer Science

March 29, 2006

R. Paul Wiegand George Mason University, Department of Computer Science

Outline

1 Introduction: Space vs. Time Tradeoff

2 Sorting by Counting

3 String Matching

4 Hashing

5 B-Trees

6 Homework

R. Paul Wiegand George Mason University, Department of Computer Science

Comparison Count Sort

ComparisonCountingSort(A[0... n − 1])

for i ← 0 to n − 1 do Count[i ] ← 0 for i ← 0 to n − 2 do for j ← i + 1 to n − 1 do if A[i ] < A[j] then Count[j] ← Count[j] + 1 else Count[j] ← Count[j] + 1 for i ← 0 to n − 1 do S[Count[i ]] ← A[i ] return S

A = 64 31 84 96 19 47 0 0 0 0 0 0 3 0 1 1 0 0 3 1 2 2 0 1 3 1 4 3 0 1 3 1 4 5 0 1 3 1 4 5 0 2 3 1 4 5 0 2

init i = 0 i = 1 i = 2 i = 3 i = 4 i = 5

S

C (n) =

∑n− 2

i =

∑n− 1

j=i +1 1 =^

n(n−1)

2 ∈^ Θ(n

For each element to be sorted, count the total number of elements smaller than this element.

R. Paul Wiegand George Mason University, Department of Computer Science

Distribution Counting

ComparisonCountingSort(A[0... n − 1])

for j ← 0 to u − ℓ do D[j] ← 0 for i ← 0 to n − 1 do D[A[i ] − ℓ] ← D[A[i ] − ℓ] + 1 for j ← 1 to u − ℓ do D[j] ← D[j − 1] + D[j] for i ← n − 1 downto 0 do j ←− A[i ] − ℓ S[D[j] − 1] ←− A[i ] D[j] ←− D[j] − 1

Sometimes the input is

constrained

Fixed array of

values

Each in [ℓ, u]

Sometimes we want

additional information

information

}

After accumulation...

D[0... 2] S[0... 5]

A[i = 5] = 12 1 4 6 12 A[i = 4] = 12 1 3 6 12 12 A[i = 3] = 13 1 2 6 12 12 13 A[i = 2] = 12 1 2 5 12 12 12 13 A[i = 1] = 11 1 2 5 11 12 12 12 13 A[i = 0] = 13 0 1 5 11 12 12 12 13 13

R. Paul Wiegand George Mason University, Department of Computer Science

Horspool’s Algorithm

Idea: When we shift, make as large a shift as possible

Match pattern from right to left

Consider character c of the text that was aligned against the last

character of the pattern

t(c) =

m, if c is not in the first m − 1 characters

dist from rightmost c in first m − 1 characters, otherwise

Still Θ(n) in Avg case, Θ(nm) in worst case

But on average, must faster than brute force

ShiftTable(P[0... m − 1])

for j ← 0 to m − 2 do T [P[j]] ← m − 1 − j

return T

R. Paul Wiegand George Mason University, Department of Computer Science

Horspool’s Algorithm

Idea: When we shift, make as large a shift as possible

Match pattern from right to left

Consider character c of the text that was aligned against the last

character of the pattern

t(c) =

m, if c is not in the first m − 1 characters

dist from rightmost c in first m − 1 characters, otherwise

Still Θ(n) in Avg case, Θ(nm) in worst case

But on average, must faster than brute force

ShiftTable(P[0... m − 1])

for j ← 0 to m − 2 do T [P[j]] ← m − 1 − j

return T

May repeatedly over- write shift value for a given character

R. Paul Wiegand George Mason University, Department of Computer Science

The Basics of Hashing

Hashes are often useful for implementing dictionaries (basic

operations: Insert, Search, & Delete)

Construct a data type to store records by key value (Hash Table),

generally an array H[0... m − 1]

Use the key to access the table by computing its address with a

predefined Hash Function, h(k)

If keys are nonnegative integers, a simple hash function is

h(k) = k mod m

For strings of a fixed length, we might use:

h(k) =

ℓ− 1

i =0 ord(ci^ )

mod m

Or, where C is a larger constant than any ord(ci ):

h ← 0; for i ← 0 to ℓ − 1 do h ← (h · C + ord(ci ))

R. Paul Wiegand George Mason University, Department of Computer Science

Collisions

Hash functions should try to:

1 Distribute keys in the table as evenly as possible

2 Be easy to compute

When the hash functions computes the same value for different

keys, a collision occurs

When m < n (n is the number of keys inserted into the table),

this will occur

Even when m ≥ n it is still possible (depending on the data

and the hash function)

Hash implementations need to have a collision resolution

method, such as:

Open hashing (separate chaining)

Closed hashing (open addressing)

R. Paul Wiegand George Mason University, Department of Computer Science

Open Hashing

Each cell in the hash table is a

linked list

Values are stored in list, collisions

are handled by chaining values

If n keys are distributed evenly,

each list is about the same size: mn

load factor — α = mn

Average number of nodes visited

during a successful search:

S ≈ 1 + α 2

Average number of nodes visited

during an unsuccessful search:

U ≈ α

Example:

m = 5 h(k) = (suitvalue + cardvalue) mod m {♠,♦ ,♥ ,♣ } = { 42 , 28 , 14 , 0 } {K , Q, J, A} = { 13 , 12 , 11 , 1 }

Data A♦ 5 ♣ 9 ♠ 7 ♠ K♥ h(k) 4 0 1 4 2

5 ♣ 9 ♠ K♥ A♦

When load factor is near 1 & keys are well distributed, access is Θ(1) on average

R. Paul Wiegand George Mason University, Department of Computer Science

Closed Hashing with Linear Probing

All keys are stored in table

On collisions, we shift right until we

find an open position

At the end, we wrap back to the start

Delete is problematic (mark & skip)

Avg. # comparisons when successful:

S ≈ 12

1 − 1 −^1 α

Avg. # comparisons when

unsuccessful: U ≈ 12

1 − (1−^1 α) 2

Example:

Data A♦ 5 ♣ 9 ♠ 7 ♠ K♥ h(k) 4 0 1 4 2

A♦

5 ♣ A♦

5 ♣ 9 ♠ A♦

5 ♣ 9 ♠ 7 ♠ A♦

5 ♣ 9 ♠ 7 ♠ K♥ A♦

R. Paul Wiegand George Mason University, Department of Computer Science

Clustering & Double Hashing

0.2 0.4 0.6 0.

100

150

200

comparisons

Unsuccessful Successful

The main problem is clustering

A cluster is a sequence of consecutive

filled positions in the table

One possible solution: double hash

Use a second hash function to

compute the probe interval

h 2 (k) = m − 2 − k mod (m − 2)

We need h 2 (k) and m to be

“relatively prime” (only common

divisor is 1)

Choosing a prime m ensures this

R. Paul Wiegand George Mason University, Department of Computer Science

Clustering & Double Hashing

0.2 0.4 0.6 0.

100

150

200

comparisons

Unsuccessful Successful

The main problem is clustering

A cluster is a sequence of consecutive

filled positions in the table

One possible solution: double hash

Use a second hash function to

compute the probe interval

h 2 (k) = m − 2 − k mod (m − 2)

We need h 2 (k) and m to be

“relatively prime” (only common

divisor is 1)

Choosing a prime m ensures this

We can still have problems as α approaches 1

Only solution: rehash (scan table & relocate into a bigger table)

R. Paul Wiegand George Mason University, Department of Computer Science

Data Structures and Algorithm Analysis - Lecture II, Study notes of Algorithms and Programming

Related documents

Partial preview of the text

Download Data Structures and Algorithm Analysis - Lecture II and more Study notes Algorithms and Programming in PDF only on Docsity!

CS 483 - Data Structures and Algorithm Analysis

Lecture VII: Chapter 7

R. Paul Wiegand

March 29, 2006

Outline

1 Introduction: Space vs. Time Tradeoff

2 Sorting by Counting

3 String Matching

4 Hashing

5 B-Trees

6 Homework

Comparison Count Sort

ComparisonCountingSort(A[0... n − 1])

S

C (n) =

∑n− 2

∑n− 1

j=i +1 1 =^

2 ∈^ Θ(n

Distribution Counting

ComparisonCountingSort(A[0... n − 1])

Sometimes the input is

constrained

Fixed array of

values

Each in [ℓ, u]

Sometimes we want

additional information

information

D[0... 2] S[0... 5]

Horspool’s Algorithm

Idea: When we shift, make as large a shift as possible

Match pattern from right to left

Consider character c of the text that was aligned against the last

character of the pattern

t(c) =

m, if c is not in the first m − 1 characters

dist from rightmost c in first m − 1 characters, otherwise

Still Θ(n) in Avg case, Θ(nm) in worst case

But on average, must faster than brute force

ShiftTable(P[0... m − 1])

for j ← 0 to m − 2 do T [P[j]] ← m − 1 − j

return T

Horspool’s Algorithm

Idea: When we shift, make as large a shift as possible

Match pattern from right to left

Consider character c of the text that was aligned against the last

character of the pattern

t(c) =

m, if c is not in the first m − 1 characters

dist from rightmost c in first m − 1 characters, otherwise

Still Θ(n) in Avg case, Θ(nm) in worst case

But on average, must faster than brute force

ShiftTable(P[0... m − 1])

for j ← 0 to m − 2 do T [P[j]] ← m − 1 − j

return T

The Basics of Hashing

Hashes are often useful for implementing dictionaries (basic

operations: Insert, Search, & Delete)

Construct a data type to store records by key value (Hash Table),

generally an array H[0... m − 1]

Use the key to access the table by computing its address with a

predefined Hash Function, h(k)

If keys are nonnegative integers, a simple hash function is

h(k) = k mod m

For strings of a fixed length, we might use:

h(k) =

i =0 ord(ci^ )

mod m

Or, where C is a larger constant than any ord(ci ):

h ← 0; for i ← 0 to ℓ − 1 do h ← (h · C + ord(ci ))

Collisions

Hash functions should try to:

1 Distribute keys in the table as evenly as possible

2 Be easy to compute

When the hash functions computes the same value for different