Download Hash Table Implementations: Chaining and Open Addressing and more Study notes Data Structures and Algorithms in PDF only on Docsity!
hs
Copyright © 1996 Hanan Samet
These notes may not be reproduced by any means (mechanical or elec- tronic or any other) without the express written permission of Hanan Samet
HASHING METHODS
Hanan Samet
Computer Science Department and Center for Automation Research and Institute for Advanced Computer Studies University of Maryland College Park, Maryland 20742 e-mail: [email protected]
hs
HASHING OVERVIEW
- Task: compare the value of a key with a set of key values in a table
- Conventional solutions:
- use a comparison on key values (tree-based)
- branching process governed by the digits comprising the key value (trie-based)
- Alternative solution is to find a 1-1 mapping (i.e., function) from set of possible key values to a memory address and use table lookup methods to retrieve the
record —O (1) process
- Problem: the set of possible key values is much larger than the number of available memory addresses
1. developing the 1-1 functionh is time-consuming as it
requires puzzle-solving abilities
- result is called a perfect hashing function
2. onceh is found, addition of a single key value may
render the function meaningless
3. can replaceh by a program, which may itself be time-
consuming to compute
- Result: usually abandon goal of finding 1-1 mapping and use a special method to resolve any ambiguity (i.e., when more than one key value is mapped to
the same address — termed acollision)
Copyright © 1998 by Hanan Samet
hs
• Hash table of sizem
• One chain (linked list) for each ofm hash values
containing all elements that hash to that location (known
as acollision list )
• Hash chains are known asbuckets
• Hash table locations are known asbucket addresses
• Forn key values, average chain size isn/m
• One chain (linked list) for each ofm hash values
- Retrieval
- use sequential search through chain
- speed up unsuccessful search by sorting chain by key value
- speed up successful search by self-organizing methods
- move key value to start of chain each time it is accessed
- Ex:
1 SEPARATE CHAINING b
h(k) NAME k=KEY NEXT 0 JIM 49 Λ 1 JOHN 22 Λ 2 RAY 30 Λ 3 SUZY 3 Λ 4 5 6
Copyright © 1998 by Hanan Samet
hs
• Hash table of sizem
• One chain (linked list) for each ofm hash values
containing all elements that hash to that location (known
as acollision list )
• Hash chains are known asbuckets
• Hash table locations are known asbucket addresses
• Forn key values, average chain size isn/m
• One chain (linked list) for each ofm hash values
- Retrieval
- use sequential search through chain
- speed up unsuccessful search by sorting chain by key value
- speed up successful search by self-organizing methods
- move key value to start of chain each time it is accessed
- Ex:
1 SEPARATE CHAINING b
h(k) NAME k=KEY NEXT 0 JIM 49 Λ 1 JOHN 22 Λ 2 RAY 30 Λ 3 SUZY 3 Λ 4 5 6
Copyright © 1998 by Hanan Samet
(^2) hs r
JANE 14 Λ
- add JANE(14)→ 0
Copyright © 1998 by Hanan Samet
hs
• Whenm is large, many of the chains are empty
- Use empty locations in table for the chain
- Must be able to distinguish between free and occupied locations
- Insertion algorithm:
- if key value not present, then allocate a free location
- link location to chain which was unsuccessfully searched
- Ex:
1 IN-PLACE CHAINING b
h(k) NAME k=KEY NEXT 0 JIM 49 Λ 1 JOHN 22 Λ 2 RAY 30 Λ 3 SUZY 3 Λ 4 5 6
Copyright © 1998 by Hanan Samet
hs
• Whenm is large, many of the chains are empty
- Use empty locations in table for the chain
- Must be able to distinguish between free and occupied locations
- Insertion algorithm:
- if key value not present, then allocate a free location
- link location to chain which was unsuccessfully searched
- Ex:
1 IN-PLACE CHAINING b
h(k) NAME k=KEY NEXT 0 JIM 49 Λ 1 JOHN 22 Λ 2 RAY 30 Λ 3 SUZY 3 Λ 4 5 6
Copyright © 1998 by Hanan Samet
(^2) hs r
- add JANE(14)→0 which collides with JIM(49)→ 0
6
JANE 14 Λ
Copyright © 1998 by Hanan Samet
hs
• Whenm is large, many of the chains are empty
- Use empty locations in table for the chain
- Must be able to distinguish between free and occupied locations
- Insertion algorithm:
- if key value not present, then allocate a free location
- link location to chain which was unsuccessfully searched
- Ex:
1 IN-PLACE CHAINING b
h(k) NAME k=KEY NEXT 0 JIM 49 Λ 1 JOHN 22 Λ 2 RAY 30 Λ 3 SUZY 3 Λ 4 5 6
Copyright © 1998 by Hanan Samet
(^2) hs r
- add JANE(14)→0 which collides with JIM(49)→ 0
6
JANE 14 Λ
Copyright © 1998 by Hanan Samet
(^3) hs z
- add LUCY(41)→6 which collides with JANE(14)→ 0 which is stored at 6
- result in coalescing of chains of JANE and LUCY making unsuccessful search longer as several chains must be searched
LUCY 41 Λ 5
Copyright © 1998 by Hanan Samet
(^4) hs g
- Can avoid coalescing by moving JANE just before adding LUCY
LUCY 41
JANE 14
5
Λ
Copyright © 1998 by Hanan Samet
hs
IN-PLACE CHAINING INSERTION ALGORITHM
location procedure CHAINING_WITH_COALESCING_INSERTION(k); begin value key k; integer i; global integer r; /* r is the most recently allocated location */ global hashtable table; i←h(k); if OCCUPIED(table[i]) then begin while NOT(NULL(NEXT(table[i])) do begin if k=KEY(table[i]) then return(i) else i←NEXT(table[i]); end; if k=KEY(table[i]) then return(i); while OCCUPIED(table[r]) do r←r-1; if r≤0 then return(OVERFLOW') else begin NEXT(table[i])←r; i←r; end; end; MARK(table[i],OCCUPIED'); KEY(table[i])←k; NEXT(table[i])←NIL; return(i); end;
Copyright © 1998 by Hanan Samet
hs
- Avoid extra space for NEXT field by not storing entire key value with record
• k =m ·q(k) +h(k),q(k) = k/m ,h(k) =k modm
• Storeq(k) in table instead ofk
• Can computek givenm,q(k), andh(k),
• Ex: 0 ≤ k < 2 32
• Since only compareq(k), all elements in same collision
list must have the same value ofh(k) and thus no
coalescing is allowed
- Data structure:
- circular collision lists
- flag FIRST denoting if first element on collision list
- pointer NEXT to next element in circular list with same
h(k) value
1 LAMPSON’S IN-PLACE CHAINING b
h(k) NAME k=KEY FIRST 0 JIM 49 T 7 0 1 JOHN 22 T 3 1 2 RAY 30 T 4 2 3 SUZY 3 T 0 3 4 5 6
q(k) NEXT
q(k) h(k)
0 21 22 31
Copyright © 1998 by Hanan Samet
(^2) hs r
- add JANE(14)→ 0
JANE 14 F 2 0
6
Copyright © 1998 by Hanan Samet
hs
- Avoid extra space for NEXT field by not storing entire key value with record
• k =m ·q(k) +h(k),q(k) = k/m ,h(k) =k modm
• Storeq(k) in table instead ofk
• Can computek givenm,q(k), andh(k),
• Ex: 0 ≤ k < 2 32
• Since only compareq(k), all elements in same collision
list must have the same value ofh(k) and thus no
coalescing is allowed
- Data structure:
- circular collision lists
- flag FIRST denoting if first element on collision list
- pointer NEXT to next element in circular list with same
h(k) value
1 LAMPSON’S IN-PLACE CHAINING b
h(k) NAME k=KEY FIRST 0 JIM 49 T 7 0 1 JOHN 22 T 3 1 2 RAY 30 T 4 2 3 SUZY 3 T 0 3 4 5 6
q(k) NEXT
q(k) h(k)
0 21 22 31
Copyright © 1998 by Hanan Samet
(^2) hs r
- add JANE(14)→ 0
JANE 14 F 2 0
6
Copyright © 1998 by Hanan Samet
(^3) hs z
- add LUCY(41)→6 but 6 contains JANE
- if at least one element of the hash chain starting at 6 exists, then it must be stored there
- must move JANE as it does not belong in 6
JANE 14 F 2 0 LUCY 41 T 5 6
5
Copyright © 1998 by Hanan Samet
hs
- Like chaining but NEXT link field is open or unspecified
- Probe sequence: set of locations comprising collision list of a key
- Goal: cycle through all locations with little or no duplication
• Linear probing: h(k),h(k)+1,h(k)+2, …,m–1, 0, 1,h(k)–
1. calculate hash addressi
2. if TABLE(i ) is empty then insert and exit; elsei← i+
modm and repeat step 2 until exhausting TABLE
1 OPEN ADDRESSING b
h(k) NAME k=KEY 0 JIM 49 1 JOHN 22 2 RAY 30 3 SUZY 3 4 5 6
Copyright © 1998 by Hanan Samet
hs
- Like chaining but NEXT link field is open or unspecified
- Probe sequence: set of locations comprising collision list of a key
- Goal: cycle through all locations with little or no duplication
• Linear probing: h(k),h(k)+1,h(k)+2, …,m–1, 0, 1,h(k)–
1. calculate hash addressi
2. if TABLE(i ) is empty then insert and exit; elsei← i+
modm and repeat step 2 until exhausting TABLE
1 OPEN ADDRESSING b
h(k) NAME k=KEY 0 JIM 49 1 JOHN 22 2 RAY 30 3 SUZY 3 4 5 6
Copyright © 1998 by Hanan Samet
(^2) hs r
- adding JANE(14)→0 yields a collision; cyclic probe sequence causes its insertion in 4
JANE 14
Copyright © 1998 by Hanan Samet
hs
- Like chaining but NEXT link field is open or unspecified
- Probe sequence: set of locations comprising collision list of a key
- Goal: cycle through all locations with little or no duplication
• Linear probing: h(k),h(k)+1,h(k)+2, …,m–1, 0, 1,h(k)–
1. calculate hash addressi
2. if TABLE(i ) is empty then insert and exit; elsei← i+
modm and repeat step 2 until exhausting TABLE
1 OPEN ADDRESSING b
h(k) NAME k=KEY 0 JIM 49 1 JOHN 22 2 RAY 30 3 SUZY 3 4 5 6
Copyright © 1998 by Hanan Samet
(^2) hs r
- adding JANE(14)→0 yields a collision; cyclic probe sequence causes its insertion in 4
JANE 14
Copyright © 1998 by Hanan Samet
(^3) hs z
- adding LUCY(41)→ 6
LUCY 41
Copyright © 1998 by Hanan Samet
(^4) hs g
- delete RAY(30)→ 2
Copyright © 1998 by Hanan Samet
hs
- Like chaining but NEXT link field is open or unspecified
- Probe sequence: set of locations comprising collision list of a key
- Goal: cycle through all locations with little or no duplication
• Linear probing: h(k),h(k)+1,h(k)+2, …,m–1, 0, 1,h(k)–
1. calculate hash addressi
2. if TABLE(i ) is empty then insert and exit; elsei← i+
modm and repeat step 2 until exhausting TABLE
1 OPEN ADDRESSING b
h(k) NAME k=KEY 0 JIM 49 1 JOHN 22 2 RAY 30 3 SUZY 3 4 5 6
Copyright © 1998 by Hanan Samet
(^2) hs r
- adding JANE(14)→0 yields a collision; cyclic probe sequence causes its insertion in 4
JANE 14
Copyright © 1998 by Hanan Samet
(^3) hs z
- adding LUCY(41)→ 6
LUCY 41
Copyright © 1998 by Hanan Samet
(^4) hs g
- delete RAY(30)→ 2
Copyright © 1998 by Hanan Samet
(^5) hs r
- problem: if look up JANE then don’t find her since a collision exists at location 0, and probe sequence finds location 2 unoccupied
Copyright © 1998 by Hanan Samet