Hash Tables with Buckets: Implementation and Concepts, Exams of Data Structures and Algorithms

An explanation of hash tables using buckets, a combination of an array and a linked list. It covers the concepts, algorithms for adding, containing, and removing elements, load factor, and resizing the table. The document also includes exercises for further exploration.

Typology: Exams

Pre 2010

Uploaded on 08/31/2009

koofers-user-2rg-1
koofers-user-2rg-1 🇺🇸

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
worksheet 25: Hash Tables with Buckets Name:
Worksheet 25: Hash Tables using Buckets
In the previous lesson you learned about the concept of hashing, and how it was
used in an open address hash table. In this lesson you will explore a different
approach to dealing with collisions, the idea of hash tables using buckets.
A hash table that uses buckets is really a combination of an array and a linked
list. Each element in the array (the hash table) is a header for a linked list. All
elements that hash into the same location will be stored in the list.
Each operation on the hash table divides into two steps. First, the element is
hashed and the remainder taken after dividing by the table size. This yields a
table index. Next, linked list indicated by the table index is examined. The
algorithms for the latter are very similar to those used in the linked list. For
example, to add a new element is simply the following:
void HashTableAdd (struct hashTable &ht, EleType newValue) {
// compute hash value to find the correct bucket
long hash = HASH(newValue);
int hashIndex = (int) (labs(hash) % ht.tablelength);
listAdd(&ht->table[hashIndex], newValue)
dataCount++; // Note: later might want to add resizing the table (below)
}
The contains test is performed using the list contains function in the appropriate
bucket. The removal operation is similar, but should only decrement the item
count if the value was actually removed from the list. This ensures the count is
accurate. An alternative implementation of the size method would have been to
loop over the buckets, asking each list for its size. What would have been the
advantages and disadvantages of this approach?
As with open address hash tables, the load factor (l) is defined as the number of
elements divided by the table size. In this structure the load factor can be larger
than one, and represents the average number of elements stored in each list,
assuming that the hash function distributes elements uniformly over all positions.
Since the running time of the contains test and removal is proportional to the
length of the list, they are O(l). Therefore the execution time for hash tables is
fast only if the load factor remains small. A typical technique is to resize the table
(doubling the size, as with the vector and the open address hash table) if the load
factor becomes larger than 10.
Complete the implementation of the HashTable class based on these ideas.
An Active Learning Approach to Data Structures using C 1
pf3
pf4

Partial preview of the text

Download Hash Tables with Buckets: Implementation and Concepts and more Exams Data Structures and Algorithms in PDF only on Docsity!

Worksheet 25: Hash Tables using Buckets

In the previous lesson you learned about the concept of hashing, and how it was used in an open address hash table. In this lesson you will explore a different approach to dealing with collisions, the idea of hash tables using buckets. A hash table that uses buckets is really a combination of an array and a linked list. Each element in the array (the hash table) is a header for a linked list. All elements that hash into the same location will be stored in the list. Each operation on the hash table divides into two steps. First, the element is hashed and the remainder taken after dividing by the table size. This yields a table index. Next, linked list indicated by the table index is examined. The algorithms for the latter are very similar to those used in the linked list. For example, to add a new element is simply the following: void HashTableAdd (struct hashTable &ht, EleType newValue) { // compute hash value to find the correct bucket long hash = HASH(newValue); int hashIndex = (int) (labs(hash) % ht.tablelength); listAdd(&ht->table[hashIndex], newValue) dataCount++; // Note: later might want to add resizing the table (below) } The contains test is performed using the list contains function in the appropriate bucket. The removal operation is similar, but should only decrement the item count if the value was actually removed from the list. This ensures the count is accurate. An alternative implementation of the size method would have been to loop over the buckets, asking each list for its size. What would have been the advantages and disadvantages of this approach? As with open address hash tables, the load factor (l) is defined as the number of elements divided by the table size. In this structure the load factor can be larger than one, and represents the average number of elements stored in each list, assuming that the hash function distributes elements uniformly over all positions. Since the running time of the contains test and removal is proportional to the length of the list, they are O(l). Therefore the execution time for hash tables is fast only if the load factor remains small. A typical technique is to resize the table (doubling the size, as with the vector and the open address hash table) if the load factor becomes larger than 10. Complete the implementation of the HashTable class based on these ideas.

struct hashTable { struct list * table; int count; int tablesize; }; void initHashTable (struct hashTable &ht, int tableSize) { } int hashTableSize (struct hashTable *ht) { return ht->count; } void HashTableAdd (struct hashTable *ht, EleType newValue) { // compute hash value to find the correct bucket long hash = HASH(newValue); int hashIndex = (int) (labs(hash) % ht.tablelength); listAdd(&ht->table[hashIndex], newValue); dataCount++; } int hashTableContains (struct hashTable *ht, EleType testElement) { } void hashTableRemove (struct hashTable *ht, EleType testElement) {

On Your Own

  1. What is a bucket?
  2. How are these hash tables similar to those used in open address hashing? How are they different?
  3. What is the definition of the load factor for a hash table? Assuming that the hash function in use distributes the elements evenly over all buckets, what is another interpretation of the load factor?
  4. Explain how the hash table combines features of an array and a linked list.
  5. Suppose you wanted to test the hash table abstraction. What would be good boundary test cases? Write a test harness to feed these test values into the hash table methods and verify the result.
  6. Would it make sense to use a different data structure, such as an AVL tree, for the buckets? What would be the advantage of this approach? What would be the disadvantage?
  7. An iterator for the hash table class must produce all elements from every bucket. This combines features of both the vector and the list iterator. Provide an implementation of this class. As with most iterators, the remove operation is the most complicated. The bucket approach to hash tables is the most common form of this data structure. It is this technique that is used in the hash tables found in the Java standard library. There are three of these. The first, a HashSet, is similar to the data structure shown here. The HashMap and the Hashtable use the same technique, but provide a map-like interface.