Download Binary Search Trees - Data Structures and Algorithm - Lecture Slides and more Slides Data Structures and Algorithms in PDF only on Docsity!
Data Structures & Algorithm
Analysis
How to Implement a Dictionary?
• Sequences
• Binary Search Trees
• Skip lists
• Hashtables
Basic Idea
• Use hash function to map keys into
positions in a hash table
Ideally
• If element e has key k and h is hash
function, then e is stored in position h(k) of
table
• To search for e , compute h(k) to locate
position. If no element, dictionary does not
contain e. Docsity.com
Example
- Dictionary Student Records
- Keys are ID numbers (951000 - 952000), no more than 100 students
- Hash function: h(k) = k-951000 maps ID into distinct table positions 0-
- array table[1001]
...
0 1 2 3 1000
hash table
buckets
Ideal Case is Unrealistic
- Works for implementing dictionaries, but many
applications have key ranges that are too large to have 1- mapping between buckets and keys!
Example:
- Suppose key can take on values from 0 .. 65,535 (2 byte
unsigned int)
- Expect ≈ 1,000 records at any given time
- Impractical to use hash table with 65,536 slots!
Hash Functions
- If key range too large, use hash table with fewer
buckets and a hash function which maps
multiple keys to same bucket:
h(k 1 ) = β = h(k 2 ): k 1 and k 2 have collision at slot β
- Popular hash functions: hashing by division
h(k) = k%D, where D number of buckets in hash table
- Example: hash table with 11 buckets
h(k) = k%
80 → 3 (80%11= 3), 40 → 7, 65 → 10
58 → 3 collision! Docsity.com
Closed Hashing
- Associated with closed hashing is a rehash strategy :
“If we try to place x in bucket h(x) and find it occupied, find alternative location h 1 (x) , h 2 (x) , etc. Try each in order, if none empty table is full,”
- h(x) is called home bucket
- Simplest rehash strategy is called linear hashing
h (^) i (x) = (h(x) + i) % D
- In general, our collision resolution strategy is to generate a
sequence of hash table slots (probe sequence) that can hold the record; test each slot until find empty one (probing)
Example Linear (Closed) Hashing
- D=8, keys a,b,c,d have hash values h(a)=3, h(b)=0, h(c)=4,
h(d)=
0 2 3 4 5 6 7 1
b
a c
Where do we insert d? 3 already filled Probe sequence using linear hashing: h 1 (d) = (h(d)+1)%8 = 4%8 = 4 h 2 (d) = (h(d)+2)%8 = 5%8 = 5* h 3 (d) = (h(d)+3)%8 = 6%8 = 6 etc. 7, 0, 1, 2 Wraps around the beginning of the table!
d
Performance Analysis - Worst Case
• Initialization: O(b), b# of buckets
• Insert and search: O(n), n number of
elements in table; all n key values have
same home bucket
• No better than linear list for maintaining
dictionary!
Performance Analysis - Avg Case
- Distinguish between successful and
unsuccessful searches
- Delete = successful search for record to be deleted
- Insert = unsuccessful search along its probe sequence
- Expected cost of hashing is a function of how
full the table is: load factor α = n/b
- It has been shown that average costs under
linear hashing (probing) are:
- Insertion: 1/2(1 + 1/(1 - α)^2 )
- Deletion: 1/2(1 + 1/(1 - α))
Example
0 1 2 3 4 5 6 7 8 9
10
1001 9537 3016
9874 2009 9875
h(k) = k%
0 1 2 3 4 5 6 7 8 9
10
1001 9537 3016
9874 2009 9875
- What if next element has home bucket 0? → go to bucket 3 Same for elements with home bucket 1 or 2! Only a record with home position 3 will stay. ⇒ p = 4/11 that next record will go to bucket 3
- Similarly, records hashing to 7,8, will end up in 10
- Only records hashing to 4 will end up in 4 (p=1/11); same for 5 and 6
I
II insert 1052 (h.b. 7)
1052
next element in bucket 3 with p = 8/11Docsity.com
Hash Functions - Numerical Value
- Consider: h(x) = x%
- poor distribution, not very random
- depends solely on least significant four bits of key
- Better, mid-square method
- if keys are integers in range 0,1,…,K , pick integer C such that DC^2 about equal to K^2 , then h(x) = x 2 /C % D extracts middle r bits of x 2 , where 2
r =D (a base-D digit)
- better, because most or all of bits of key contribute to result
Hash Function –
Strings of Characters
- Much better: Cyclic Shift
static long hashCode(String key, int D) { int h=0; for (int i=0, i<key.length(); i++){ h = (h << 4) | ( h >> 27); h += (int) key.charAt(i); } return h%D; }
Open Hashing
• Each bucket in the hash table is the head of a
linked list
• All elements that hash to a particular bucket
are placed on that bucket’s linked list
• Records within a bucket can be ordered in
several ways
- by order of insertion, by key value order, or by
frequency of access order