Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Hash Table Implementation and Collision Strategies - Prof. David J. Galles, Study notes of Data Structures and Algorithms

University of San Francisco (USF)Data Structures and Algorithms

Prof. David J. Galles

The implementation of hash tables using different data structures such as sorted lists, binary search trees, and unsorted arrays. It also covers the concept of hash functions, collision strategies, and various techniques to minimize collisions. Specific examples of integer and string hash functions are provided.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-vl9 🇺🇸

(1)

10 documents

1 / 8

This page cannot be seen from the preview

Don't miss anything!

CS245-2009S-13 Hash Tables 1

13-0: Searching & Selecting

•Maintian a Database (keys and associated data)

•Operations:

•Add a key / value pair to the database

•Remove a key (and associated value) from the database

•Find the value associated with a key

13-1: Sorted List Implementation

If database is implemented as a sorted list:

•Add

•Remove

•Find

13-2: Sorted List Implementation

If database is implemented as a sorted list:

•Add O(n)

•Remove O(n)

•Find O(lg n)

13-3: BST Implementation

If database is implemented as a Binary Search Tree:

•Add

•Remove

•Find

13-4: BST Implementation

If database is implemented as a Binary Search Tree:

•Add O(lg n)best, O(n)worst

•Remove O(lg n)best, O(n)worst

•Find O(lg n)best, O(n)worst

13-5: Unsorted List

Maintain an unsorted,non-contiguous array of elements

3415 813 6

•How long does a Find take?

•How long does a Remove take?

•How long does an Add take?

Discover Study notes of Data Structures and Algorithms University of San Francisco (USF)

Partial preview of the text

Download Hash Table Implementation and Collision Strategies - Prof. David J. Galles and more Study notes Data Structures and Algorithms in PDF only on Docsity!

13-0: Searching & Selecting

Maintian a Database (keys and associated data)
Operations:
- Add a key / value pair to the database
- Remove a key (and associated value) from the database
- Find the value associated with a key

13-1: Sorted List Implementation If database is implemented as a sorted list:

Add
Remove
Find

13-2: Sorted List Implementation

If database is implemented as a sorted list:

Add O(n)
Remove O(n)
Find O(lg n)

13-3: BST Implementation

If database is implemented as a Binary Search Tree:

Add
Remove
Find

13-4: BST Implementation If database is implemented as a Binary Search Tree:

Add O(lg n) best, O(n) worst
Remove O(lg n) best, O(n) worst
Find O(lg n) best, O(n) worst

13-5: Unsorted List Maintain an unsorted , non-contiguous array of elements

How long does a Find take?
How long does a Remove take?
How long does an Add take?

Does this sound like a good idea? 13-6: Hash Function

What if we had a “magic function” –
- Takes a key as input
- Returns the index in the array where the key can be found, if the key is in the array
To add an element
- Put the key through the magic function, to get a location
- Store element in that location
To find an element
- Put the key through the magic function, to get a location
- See if the key is stored in that location

13-7: Hash Function

The “magic function” is called a Hash function
If hash(key) = i, we say that the key hashes to the value i
We’d like to ensure that different keys will always hash to different values.
Why is this not possible?

13-8: Hash Function

The “magic function” is called a Hash function
If hash(key) = i, we say that the key hashes to the value i
We’d like to ensure that different keys will always hash to different values.
Why is this not possible?
- Too many possible keys
- If keys are strings of up to 15 letters, there are 1021 different keys
- 1 sextillion – number of grains of salt it would take to fill this room one million times over.

13-9: Integer Hash Function

When two keys hash to the same value, a collision occurs.
We cannot avoid collisions, but we can minimize them by picking a hash function that distributes keys evenly through the array.
Example: Keys are integers
- Keys are in range 1... m
- Array indices are in range 1... n
- n << m

What if table size = 10, all keys end in 0?
What if table size is even, all keys are even?
In general, what if the table size and many of the keys share factors?
What can we do?
- Prevent keys and table size from sharing factors.
- No control over the keys.
- Make the table size prime.

13-17: String Hash Function

Hash tables are usually used to store string values
If we can convert a string into an integer, we can use the integer hash function
How can we convert a string into an integer?

13-18: String Hash Function

Hash tables are usually used to store string values
If we can convert a string into an integer, we can use the integer hash function
How can we convert a string into an integer?
- Add up ASCII values of the characters in the string

int hash(String key, int tableSize) { int hashvalue = 0; for (int i=0; i<key.length(); i++) hashvalue += (int) key.charAt(i); return hashvalue % tableSize; }

13-19: String Hash Function

Hash tables are usually used to store string values
If we can convert a string into an integer, we can use the integer hash function
How can we convert a string into an integer?
- Concatenate ASCII digits together

keysize∑− 1

key[k] ∗ 256 keysize−k−^1

13-20: String Hash Function

Concatenating digits does not work, since numbers get big too fast. Solutions:
- Overlap digits a little (use base of 32 instead of 256)

Ignore early characters (shift them off the left side of the string)

static long hash(String key, int tablesize) { long h = 0; int i; for (i=0; i<key.length(); i++) h = (h << 4) + (int) key.charAt(i); return h % tablesize; }

13-21: ElfHash

For each new character, the hash value is shifted to the left, and the new character is added to the accumulated value.
If the string is long, the early characters will “fall off” the end of the hash value when it is shifted
- Early characters will not affect the hash value of large strings
Instead of falling off the end of the string, the most significant bits can be shifted to the middle of the string, and XOR’ed.
Every character will influence the value of the hash function.

13-22: ElfHash

static long ELFhash(String key, int tablesize) { long h = 0; long g; int i;

for (i=0; i<key.length(); i++) { h = (h << 4) + (int) key.charAt(i); g = h & 0xF0000000L; if (g != 0) h ˆ= g >>> 24 h &= ˜g } return h % M; }

13-23: Collisions

When two keys hash to the same value, a collision occurs
A collision strategy tells us what to do when a collision occurs
Two basic collision strategies:
- Open Hashing (Closed Addressing, Separate Chaining)
- Closed Hashing (Open Addressing)

13-24: Open Hashing

Array does not store elements, but linked-lists of elements

Primary Clustering
- “Clumps” – large sequences of consecutively filled array elements – tend to form
- Positive feedback system – the larger the clumps, the more likely an element will end up in a clump.

13-30: Closed Hashing

Quadradic probing
- Find the smallest i, such that Array[hash(x) + f(i)] is empty
- Add X to Array[hash(x) + f(i)]
- f(i) = i^2

13-31: Closed Hashing

Quadradic probing
- Find the smallest i, such that Array[hash(x) + f(i)] is empty
- Add X to Array[hash(x) + f(i)]
- f(i) = i^2
Problems:
- Can’t reach all elements in the list

13-32: Closed Hashing

Quadradic probing
- Find the smallest i, such that Array[hash(x) + f(i)] is empty
- Add X to Array[hash(x) + f(i)]
- f(i) = i^2
Problems:
- Can’t reach all elements in the list
- (if table is less than 1/2 full, and table size is an integer, guaranteed to be able to add an element)

13-33: Closed Hashing

Pseudo-Random
- Create a “Permutation Array” P
- f(i) = P[i]

13-34: Closed Hashing

Multiple keys hash to the same element
- Secondary clustering
Double Hashing
- Use a secondary hash function to determine how far ahead to look

f(i) = i * hash2(key)

13-35: Deletion

Deletion from an open hash table is easy.
- Find the element.
- Delete it.
Deletion from a closed hash table is harder.
- Why?

13-36: Deletion

Deletion a closed hash table can cause problems
Three different kinds of entries
- Empty cells
- Cells that contain data
- Cells that have been deleted (tombstones)

13-37: Deletion

To insert an element:
- Find the smallest i such that hash(x) + f(i) is either empty or deleted
To find an element
- Try all values of i (starting with 0) until either
  - Table[hash(x) + f(i)] = x
  - Table[hash(x) + f(i)] is empty ( not deleted)

13-38: Rehashing

What can we do when our closed hash table gets full?
– Or if the load (# of elements / table size) gets larger than 0.
- Create a new, larger table
  - New hash table will have a different hash function, since the table size is different
- Add each element in the old table to the new table

13-39: Rehashing

When we creata a new table, it should be approx. twice as large as the old table
- A single insert can now require Θ(n) work
- ... but only after Θ(n) inserts
- Time for n inserts is Θ(n)
- Average time for an insert is still Θ(1)
What happens if we make the table 100 units larger, instead of twice as large?
- Rememeber to keep the table size prime!

Hash Table Implementation and Collision Strategies - Prof. David J. Galles, Study notes of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Hash Table Implementation and Collision Strategies - Prof. David J. Galles and more Study notes Data Structures and Algorithms in PDF only on Docsity!