



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Three methods for implementing hash tables: direct-address tables, hash tables with chaining, and open-address hashing. Direct-address tables use an array to store data directly at the index corresponding to the key. Hash tables use a hash function to map keys to indices and handle collisions with chaining. Open-address hashing probes for empty slots when a collision occurs. The document also covers good hash functions, universal hashing, and the analysis of uniform hashing.
Typology: Study notes
1 / 7
This page cannot be seen from the preview
Don't miss anything!




Direct-Address Tables
Let U = { 0 ,... , m − 1 } be the set of possible keys. Use array T [0... m − 1] as a direct-address table. There is a 1-1 correspondence between keys and slots.
Direct-Address-Search(T, k) return T [k]
Direct-Address-Insert(T, x) T [key[x]] ← x
Direct-Address-Delete(T, k) T [key[x]] ← nil
Advantage: operations are Θ(1). Disadvantage: Θ(|U |) space required.
Hash Tables
Let K be the set of keys to be stored.
Goal: use Θ(|K|) space and Θ(1) time/op.
Idea: Use array T [0... m − 1] as a hash table, and use a Θ(1) hash function h, where h : U → { 0 ,... , m− 1 } maps from keys to slots.
A collision is when two keys map to the same slot.
Good Hash Functions
Division method: h(k) = k mod m m is prime, not close to any 2i.
Division variation: h(k) = (k mod M ) mod m M is a big prime, not close to any 2i. m is any number much smaller than M.
Multiplication method: h(k) = bm((kA) mod 1)c m is a power of 2. A = (
Horner’s Method for Division Hash Function
If k = 〈k[1],... , k[l]〉, and if 0 ≤ k[i] < r, then compute hash function by:
h ← k[1] mod m for i ← 2 to l do h ← (rh + k[i]) mod m
Chaining
In chaining, slots are linked lists of the elements that hash to that slot, i.e., collisions.
Consider m slots, n elts., load factor α = n/m. Worst-case: Θ(n) if all elts. hash to same slot. Best-case: Θ(1 + α), each slot has bαc or dαe.
Average-case: Assume each slot is equally likely.
Unsuccessful search: Θ(1 + α) This is because average slot length = α.
Successful search: Θ(1 + α) Before ith elt. inserted, avg. length = (i−1)/m. Expected position of ith elt. = 1 + (i − 1)/m.
Expected search length is the summation:
Σni=1 n^ elements to search for. 1 /n Prob. for ith element is 1/n. 1 + (i − 1)/m Expected position of ith elt.
n Σ i=
n
1 +^
i − 1 m
= 1 +^
α 2
2 m
Open-Address Hashing
In open addressing, when a collision occurs, probe for an empty slot and insert the new elt. there.
The hash function becomes: h : U × { 0 ,... , m − 1 } → { 0 ,... , m − 1 }
The probe sequence 〈h(k, 0),... , h(k, m − 1)〉 should include all the slots.
Hash-Insert(T, x) for i ← 0 to m − 1 do j ← h(key[x], i) if T [j] = nil then T [j] ← x return j error “hash table overflow”
Hash-Delete marks the slot as deleted. Hash-Search must continue past deleted slots. Hash-Insert can put new elts. in deleted slots.
Performance of Practical Methods
Linear Probing: h(k, i) = (h′(k) + i) mod m
Successful Search: Θ
(^1) −^1 α
Unsuccessful Search: Θ
1 (1−α)^2
Linear probing suffers from primary clustering, from long runs of occupied slots.
An empty slot preceded by i full slots gets filled next with probability (i + 1)/m.
Quadratic Probing assumes m is a power of 2.
h(k, i) = (h′(k) +
i 2
i^2 2
) mod m
Successful Search: Θ
(^) α^1 ln (^1) −^1 α
Unsuccessful Search: Θ
1 1 −α
Double Hashing, m is prime, 1 ≤ h 2 (k) ≤ m− 1
h(k, i) = (h 1 (k) + i h 2 (k)) mod m
Successful Search: Θ
(^) α^1 ln (^1) −^1 α
Unsuccessful Search: Θ
(^1) −^1 α