Hash Tables with Buckets: Implementation and Concepts | Exams Data Structures and Algorithms

worksheet 25: Hash Tables with Buckets Name:

Worksheet 25: Hash Tables using Buckets

In the previous lesson you learned about the concept of hashing, and how it was

used in an open address hash table. In this lesson you will explore a different

approach to dealing with collisions, the idea of hash tables using buckets.

A hash table that uses buckets is really a combination of an array and a linked

list. Each element in the array (the hash table) is a header for a linked list. All

elements that hash into the same location will be stored in the list.

Each operation on the hash table divides into two steps. First, the element is

hashed and the remainder taken after dividing by the table size. This yields a

table index. Next, linked list indicated by the table index is examined. The

algorithms for the latter are very similar to those used in the linked list. For

example, to add a new element is simply the following:

void HashTableAdd (struct hashTable &ht, EleType newValue) {

// compute hash value to find the correct bucket

long hash = HASH(newValue);

int hashIndex = (int) (labs(hash) % ht.tablelength);

listAdd(&ht->table[hashIndex], newValue)

dataCount++; // Note: later might want to add resizing the table (below)

}

The contains test is performed using the list contains function in the appropriate

bucket. The removal operation is similar, but should only decrement the item

count if the value was actually removed from the list. This ensures the count is

accurate. An alternative implementation of the size method would have been to

loop over the buckets, asking each list for its size. What would have been the

advantages and disadvantages of this approach?

As with open address hash tables, the load factor (l) is defined as the number of

elements divided by the table size. In this structure the load factor can be larger

than one, and represents the average number of elements stored in each list,

assuming that the hash function distributes elements uniformly over all positions.

Since the running time of the contains test and removal is proportional to the

length of the list, they are O(l). Therefore the execution time for hash tables is

fast only if the load factor remains small. A typical technique is to resize the table

(doubling the size, as with the vector and the open address hash table) if the load

factor becomes larger than 10.

Complete the implementation of the HashTable class based on these ideas.

An Active Learning Approach to Data Structures using C 1

Partial preview of the text

Download Hash Tables with Buckets: Implementation and Concepts and more Exams Data Structures and Algorithms in PDF only on Docsity!

Worksheet 25: Hash Tables using Buckets

In the previous lesson you learned about the concept of hashing, and how it was used in an open address hash table. In this lesson you will explore a different approach to dealing with collisions, the idea of hash tables using buckets. A hash table that uses buckets is really a combination of an array and a linked list. Each element in the array (the hash table) is a header for a linked list. All elements that hash into the same location will be stored in the list. Each operation on the hash table divides into two steps. First, the element is hashed and the remainder taken after dividing by the table size. This yields a table index. Next, linked list indicated by the table index is examined. The algorithms for the latter are very similar to those used in the linked list. For example, to add a new element is simply the following: void HashTableAdd (struct hashTable &ht, EleType newValue) { // compute hash value to find the correct bucket long hash = HASH(newValue); int hashIndex = (int) (labs(hash) % ht.tablelength); listAdd(&ht->table[hashIndex], newValue) dataCount++; // Note: later might want to add resizing the table (below) } The contains test is performed using the list contains function in the appropriate bucket. The removal operation is similar, but should only decrement the item count if the value was actually removed from the list. This ensures the count is accurate. An alternative implementation of the size method would have been to loop over the buckets, asking each list for its size. What would have been the advantages and disadvantages of this approach? As with open address hash tables, the load factor (l) is defined as the number of elements divided by the table size. In this structure the load factor can be larger than one, and represents the average number of elements stored in each list, assuming that the hash function distributes elements uniformly over all positions. Since the running time of the contains test and removal is proportional to the length of the list, they are O(l). Therefore the execution time for hash tables is fast only if the load factor remains small. A typical technique is to resize the table (doubling the size, as with the vector and the open address hash table) if the load factor becomes larger than 10. Complete the implementation of the HashTable class based on these ideas.

struct hashTable { struct list * table; int count; int tablesize; }; void initHashTable (struct hashTable &ht, int tableSize) { } int hashTableSize (struct hashTable *ht) { return ht->count; } void HashTableAdd (struct hashTable *ht, EleType newValue) { // compute hash value to find the correct bucket long hash = HASH(newValue); int hashIndex = (int) (labs(hash) % ht.tablelength); listAdd(&ht->table[hashIndex], newValue); dataCount++; } int hashTableContains (struct hashTable *ht, EleType testElement) { } void hashTableRemove (struct hashTable *ht, EleType testElement) {

On Your Own

What is a bucket?
How are these hash tables similar to those used in open address hashing? How are they different?
What is the definition of the load factor for a hash table? Assuming that the hash function in use distributes the elements evenly over all buckets, what is another interpretation of the load factor?
Explain how the hash table combines features of an array and a linked list.
Suppose you wanted to test the hash table abstraction. What would be good boundary test cases? Write a test harness to feed these test values into the hash table methods and verify the result.
Would it make sense to use a different data structure, such as an AVL tree, for the buckets? What would be the advantage of this approach? What would be the disadvantage?
An iterator for the hash table class must produce all elements from every bucket. This combines features of both the vector and the list iterator. Provide an implementation of this class. As with most iterators, the remove operation is the most complicated. The bucket approach to hash tables is the most common form of this data structure. It is this technique that is used in the hash tables found in the Java standard library. There are three of these. The first, a HashSet, is similar to the data structure shown here. The HashMap and the Hashtable use the same technique, but provide a map-like interface.

Hash Tables with Buckets: Implementation and Concepts, Exams of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Hash Tables with Buckets: Implementation and Concepts and more Exams Data Structures and Algorithms in PDF only on Docsity!

Worksheet 25: Hash Tables using Buckets

On Your Own