Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Hash Tables in Algorithms & Data Abstract Structures - CPSC 223, Fall 2010, Slides of Data Structures and Algorithms

Dhirubhai Ambani Institute of Information and Communication Technology Data Structures and Algorithms

A part of the lecture notes for the algorithms & data abstract structures course (cpsc 223) at the university of x, taught in the fall of 2010. The notes cover the topic of hash tables, including the basic idea, advantages over arrays, hash functions, collisions, and resolving collisions using open addressing and separate chaining. The document also includes examples of hash functions and their performance.

Typology: Slides

2012/2013

Uploaded on 09/09/2013

zaid 🇮🇳

4.5

(2)

59 documents

1 / 13

This page cannot be seen from the preview

Don't miss anything!

11/30/10%

CPSC 223

Algorithms & Data Abstract Structures

Lecture 24: !

Hash Tables!

Today …

• Hash Tables [Ch 12: 686-706]!

• Reminders:!

– Project presentations Thursday … !

– Guest lecture next Tuesday!

– Next week: (re)read “The data-structure canon” !

CPSC%223%**%Fall%2010%

Discover Slides of Data Structures and Algorithms Dhirubhai Ambani Institute of Information and Communication Technology

Partial preview of the text

Download Hash Tables in Algorithms & Data Abstract Structures - CPSC 223, Fall 2010 and more Slides Data Structures and Algorithms in PDF only on Docsity!

CPSC 223

Algorithms & Data Abstract Structures

Lecture 24:

Hash Tables

Today …

• Hash Tables [Ch 12: 686-706]

• Reminders:

Project presentations Thursday …
Guest lecture next Tuesday
Next week: (re)read “ The data-structure canon ” CPSC 223 -‐-‐ Fall 2010

B-Trees versus Arrays

What are advantages of balanced search trees over

arrays for storing collections of data items?

Output (traversal) in sorted order
Faster retrieve (and lookup) …
O ( n ) for arrays, O (log n ) for balanced search trees

Can we improve search time for arrays?

Yes!

Using Hash Tables … CPSC 223 -‐-‐ Fall 2010

Hash Tables

Basic Idea

Define a “ hash function ” h
h : Key → Index
Make h fast (e.g., constant time)
This makes retrieve O (1)!
… which is even faster than in BSTs CPSC 223 -‐-‐ Fall 2010 h 0 1 2 n – 1 key h maps keys to array indexes table

Hash Functions

“ Perfect Hash Functions ”

Map each key to a unique array index
Hard if you do not know all search key values to expect
Note you may also have more keys than indexes
Most Hash Functions
Map two or more keys to the same index
This results in “ collisions ”
We have to deal with collisions (more later) …
… but we also want hash functions that minimize collisions CPSC 223 -‐-‐ Fall 2010

Examples of Hash Functions (from textbook)

Assumptions

keys are positive integers
we have a hash table (array) of 100 elements (0 .. 99)

“ Selecting digits ”

Select digits of the key to use as the hash value
Lets say keys are 9-digit employee numbers
- h ( k ) = 4th^ and 9th^ digit
- For example: h (001364825) = 35
- Here we store (retrieve) entry with key 001364825 at table[35]
This is a fast and simple approach, but
- May not evenly distribute data CPSC 223 -‐-‐ Fall 2010

Examples of Hash Functions (from textbook)

“ Folding ”

Add digits instead
Lets say keys are 9-digit employee numbers
- h ( k ) = i 1 + i 2 + … + i 9 where k = i 1 i 2 … i 9
- For example: h (001364825) = 29
- Store (retrieve) entry with key 001364825 at table[29]
This is also fast, but
- Also may not evenly distribute data
- In this example, only hits ranges from 0 to 81
- Can pick different schemes (like i 1 i 2 i 3 + i 4 i 5 i 6 + i 7 i 8 i 9 ) CPSC 223 -‐-‐ Fall 2010

Examples of Hash Functions (from textbook)

“ Modular Arithmetic ”

Sometimes we end up with indexes outside of the

range of table indexes

We can use the modulo operator (%) to map values to

valid table indexes

h ( k ) = i mod table size
In our example we can use the key directly … h (001364825) = 1,364,825 mod 100 = 25
Key values used may require carefully chosen table sizes
- E.g., 110 mod 100, 210 mod 100, 310 mod 100, etc
- Convention to more evenly distribute values is to use a prime number (e.g., 101 in this case) CPSC 223 -‐-‐ Fall 2010

Resolving Collisions (insert)

Two general approaches Open Addressing - If location occupied, then find another location Restructuring the Hash Table - Add more room to the Hash Table to store collisions CPSC 223 -‐-‐ Fall 2010

Approach 1: Open Addressing

If a location is taken, “ probe ” (search) array for the next

“ open ” (available) index

Linear probing
- Search for next available sequentially
- Take the next free index
- If at the end, start at position 0
- Search works similarly
  - Deletion tricky
  - Mark indexes as “deleted” so we don’t throw off search CPSC 223 -‐-‐ Fall 2010 k k 0 1 2 3 k 4 h (k4) = i = 1 i + 1 i + 2 i + 3 Linear probing can create large “primary” clusters

Approach 1: Open Addressing

If a location is taken, “ probe ” (search) array for the next

“ open ” (available) index

Quadratic probing
- Helps eliminate “ primary ” clusters
- Instead of sequentially probing
- Probe “quadratic” sequences
  - i + 1^2 , i + 2^2 , i + 3^2 , i + 4^2 , …
- Creates “ secondary ” clusters since collisions use same sequences CPSC 223 -‐-‐ Fall 2010 k k 0 1 2 3 4 h (k3) = i = 1 i + 1 5^ i^ + 4

Approach 1: Open Addressing

If a location is taken, “ probe ” (search) array for the next

“ open ” (available) index

Double hashing
- Further reduces clustering
- Use a second hash function h 2 to determine the size of sequence steps
- Note steps depend on key value CPSC 223 -‐-‐ Fall 2010 k k 0 1 2 3 k 4 h (k4) = i = 1 i + h 2 (k4)

Approach 2: Restructure Array

Change the structure of the hash table to hold

multiple items in the same position

Separate Chaining (HW10)
- Instead of using a static arrays, use linked lists ( chains )
- We end up with “ Chain Nodes ” holding entries CPSC 223 -‐-‐ Fall 2010

Approach 2: Restructure Array

Separate Chaining

Instead of using a static array, use a linked list (chain)
End up with “Chain Nodes” holding entries CPSC 223 -‐-‐ Fall 2010 h key Table (Array)

Linked Lists (one per table loca2on)

The Cost of Hashing

Ideally
- Insert, Delete, and Retrieve are O (1)
- Traversal is O ( n ) … but the result is not sorted
In practice
- Collisions increase the cost
- Cost depends on the “ load factor ” ... how full the table is = # items in table / table size
- As the table fills, the chances of collisions increase
- Thus hashing efficiency decreases as load factor increases Note that > 1 if more items than array positions CPSC 223 -‐-‐ Fall 2010

The Cost of Hashing

Cost of Separate Chaining

Insertion is still O (1)
- New items added to the front of the linked list
Deletion, retrieval may require searching entire linked list

(chain)

So again, cost depends on collisions
Here is the average length of each linked list

(assuming a “good” hash function)

But since α = n / constant , search is worst-case O ( n )
In practice, hash tables are efficient at searching though! CPSC 223 -‐-‐ Fall 2010

HashTable

Maps each keyword to a table index (hash function)
Each table index contains a (linked) List of ChainNodes
In Dictionary
- insert involves adding (keyword, Entry) pairs
- remove involves removing Entry’s (and possibly ChainNodes)
- search (new operation) finds and returns Entries given a keyword

Assignment 10 – Hash Table

CPSC 223 -‐-‐ Fall 2010 Table 0 1 2 3 4 5 C1 : ChainNode keyword = “device” L1 : List e1 : Entry e2 : Entry C2 : ChainNode keyword = “contrivance” L2 : List e1 : Entry L3 : List If h(device) = 2 and h(contrivance) = 2

Hash Tables in Algorithms & Data Abstract Structures - CPSC 223, Fall 2010, Slides of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Hash Tables in Algorithms & Data Abstract Structures - CPSC 223, Fall 2010 and more Slides Data Structures and Algorithms in PDF only on Docsity!

CPSC 223

Algorithms & Data Abstract Structures

Lecture 24:

Hash Tables

Today …

• Hash Tables [Ch 12: 686-706]

• Reminders:

B-Trees versus Arrays

arrays for storing collections of data items?

Can we improve search time for arrays?

Hash Tables

Basic Idea

Hash Functions

“ Perfect Hash Functions ”

Examples of Hash Functions (from textbook)

Assumptions

“ Selecting digits ”

Examples of Hash Functions (from textbook)

“ Folding ”

Examples of Hash Functions (from textbook)

“ Modular Arithmetic ”

range of table indexes

valid table indexes

Resolving Collisions (insert)

Approach 1: Open Addressing

If a location is taken, “ probe ” (search) array for the next

“ open ” (available) index

Approach 1: Open Addressing

If a location is taken, “ probe ” (search) array for the next

“ open ” (available) index

Approach 1: Open Addressing

If a location is taken, “ probe ” (search) array for the next

“ open ” (available) index

Approach 2: Restructure Array

Change the structure of the hash table to hold

multiple items in the same position

Approach 2: Restructure Array

Separate Chaining

The Cost of Hashing

The Cost of Hashing

Cost of Separate Chaining

(chain)

(assuming a “good” hash function)

HashTable

Assignment 10 – Hash Table