Data Structures: Direct-Address Tables, Hash Tables, and Open-Address Hashing, Study notes of Algorithms and Programming

Three methods for implementing hash tables: direct-address tables, hash tables with chaining, and open-address hashing. Direct-address tables use an array to store data directly at the index corresponding to the key. Hash tables use a hash function to map keys to indices and handle collisions with chaining. Open-address hashing probes for empty slots when a collision occurs. The document also covers good hash functions, universal hashing, and the analysis of uniform hashing.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-d4u
koofers-user-d4u 🇺🇸

9 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Direct-Address Tables
Let U={0, . . . , m 1}be the set of possible keys.
Use array T[0 . . . m 1] as a direct-address table.
There is a 1-1 correspondence between keys and slots.
Direct-Address-Search(T, k)
return T[k]
Direct-Address-Insert(T, x)
T[key[x]] x
Direct-Address-Delete(T, k)
T[key[x]] nil
Advantage: operations are Θ(1).
Disadvantage: Θ(|U|) space required.
Hash Tables
Let Kbe the set of keys to be stored.
Goal: use Θ(|K|) space and Θ(1) time/op.
Idea: Use array T[0 . . . m 1] as a hash table,
and use a Θ(1) hash function h, where
h:U {0, . . . , m1}maps from keys to slots.
Acollision is when two keys map to the same
slot.
pf3
pf4
pf5

Partial preview of the text

Download Data Structures: Direct-Address Tables, Hash Tables, and Open-Address Hashing and more Study notes Algorithms and Programming in PDF only on Docsity!

Direct-Address Tables

Let U = { 0 ,... , m − 1 } be the set of possible keys. Use array T [0... m − 1] as a direct-address table. There is a 1-1 correspondence between keys and slots.

Direct-Address-Search(T, k) return T [k]

Direct-Address-Insert(T, x) T [key[x]] ← x

Direct-Address-Delete(T, k) T [key[x]] ← nil

Advantage: operations are Θ(1). Disadvantage: Θ(|U |) space required.

Hash Tables

Let K be the set of keys to be stored.

Goal: use Θ(|K|) space and Θ(1) time/op.

Idea: Use array T [0... m − 1] as a hash table, and use a Θ(1) hash function h, where h : U → { 0 ,... , m− 1 } maps from keys to slots.

A collision is when two keys map to the same slot.

Good Hash Functions

Division method: h(k) = k mod m m is prime, not close to any 2i.

Division variation: h(k) = (k mod M ) mod m M is a big prime, not close to any 2i. m is any number much smaller than M.

Multiplication method: h(k) = bm((kA) mod 1)c m is a power of 2. A = (

Horner’s Method for Division Hash Function

If k = 〈k[1],... , k[l]〉, and if 0 ≤ k[i] < r, then compute hash function by:

h ← k[1] mod m for i ← 2 to l do h ← (rh + k[i]) mod m

Chaining

In chaining, slots are linked lists of the elements that hash to that slot, i.e., collisions.

Consider m slots, n elts., load factor α = n/m. Worst-case: Θ(n) if all elts. hash to same slot. Best-case: Θ(1 + α), each slot has bαc or dαe.

Average-case: Assume each slot is equally likely.

Unsuccessful search: Θ(1 + α) This is because average slot length = α.

Successful search: Θ(1 + α) Before ith elt. inserted, avg. length = (i−1)/m. Expected position of ith elt. = 1 + (i − 1)/m.

Expected search length is the summation:

Σni=1 n^ elements to search for. 1 /n Prob. for ith element is 1/n. 1 + (i − 1)/m Expected position of ith elt.

n Σ i=

  

n

  

  1 +^

i − 1 m

   = 1 +^

α 2

2 m

Open-Address Hashing

In open addressing, when a collision occurs, probe for an empty slot and insert the new elt. there.

The hash function becomes: h : U × { 0 ,... , m − 1 } → { 0 ,... , m − 1 }

The probe sequence 〈h(k, 0),... , h(k, m − 1)〉 should include all the slots.

Hash-Insert(T, x) for i ← 0 to m − 1 do j ← h(key[x], i) if T [j] = nil then T [j] ← x return j error “hash table overflow”

Hash-Delete marks the slot as deleted. Hash-Search must continue past deleted slots. Hash-Insert can put new elts. in deleted slots.

Performance of Practical Methods

Linear Probing: h(k, i) = (h′(k) + i) mod m

Successful Search: Θ

  (^1) −^1 α

 

Unsuccessful Search: Θ

  1 (1−α)^2

 

Linear probing suffers from primary clustering, from long runs of occupied slots.

An empty slot preceded by i full slots gets filled next with probability (i + 1)/m.

Quadratic Probing assumes m is a power of 2.

h(k, i) = (h′(k) +

i 2

i^2 2

) mod m

Successful Search: Θ

  (^) α^1 ln (^1) −^1 α

 

Unsuccessful Search: Θ

  1 1 −α

 

Double Hashing, m is prime, 1 ≤ h 2 (k) ≤ m− 1

h(k, i) = (h 1 (k) + i h 2 (k)) mod m

Successful Search: Θ

  (^) α^1 ln (^1) −^1 α

 

Unsuccessful Search: Θ

  (^1) −^1 α

 