




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
By surendra udainiya Bsc computer science delhi university
Typology: Study notes
1 / 158
This page cannot be seen from the preview
Don't miss anything!





























































































TutorialsDuniya.com
C and C++ Computer System Architecture
Programming in Java Discrete Structures
Data Structures Operating Systems
Computer Networks Algorithms
Android Programming DataBase Management Systems
PHP Programming Software Engineering
JavaScript Theory of Computation
Java Server Pages Operational Research
Python System Programming
Microprocessor Data Mining
Artificial Intelligence Computer Graphics
Machine Learning Data Science
Compiled Books: https://www.tutorialsduniya.com/compiled-books
Programs: https://www.tutorialsduniya.com/programs
Question Papers: https://www.tutorialsduniya.com/question-papers
Python Notes: https://www.tutorialsduniya.com/python
Java Notes: https://www.tutorialsduniya.com/java
JavaScript Notes: https://www.tutorialsduniya.com/javascript
JSP Notes: https://www.tutorialsduniya.com/jsp
Microprocessor Notes: https://www.tutorialsduniya.com/microprocessor
OR Notes: https://www.tutorialsduniya.com/operational-research
A dictionary is a container of elements from a totally ordered universe that supports the basic operations of inserting/deleting elements and searching for a given element. (OR)
Dictionary is a Dynamic-set data structure for storing items indexed using keys. It Supports operations Insert, Search, and Delete.
Ex: hash tables are dictionaries which provide an efficient implicit realization of a dictionary. Efficient explicit implementations include binary search trees and balanced search trees.
Dictionaries are useful in implementing symbol table of a compiler, text retrieval systems, database systems, page mapping tables, Large-scale distributed systems etc.
Dictionaries can be implemented with:
1. Fixed Length arrays 2. Linked lists: sorted, unsorted, skip-lists 3. Hash Tables: open, closed
4. Trees: Binary Search Trees (BSTs), Balanced BSTs like AVL Trees, Red-Black Trees Splay Trees, Multi way Search Trees like 2-3 Trees , B Trees, 5. Tries
Let n be the number of elements in a dictionary D. The following is a summary of the performance of some basic implementation methods:
. Search Delete Insert Min Array O(n) O(n) O(n) O(n) Sorted linked list O(n) O(n) O(n) O(1) Unsorted linked list O(n) O(n) O(n) O(n)
Among these, the sorted list has the best average case performance Arrays, sorted linked lists, unsorted linked lists all takes O(n) time for insert, delete, Search, min operations
HASH TABLES
Necessity of Hash tables:
We can access any position of an array in constant time (i.e., in O(1)). We think of the subscript as the key, and the value stored in the array as the data. Given the key, we can access the data in constant time.
0 1 2 3 4 5 6 7 8 9 10 20 25 30 35 40 45 50 55 60
Generally search operation in an array takes O(n) time. But if we know the positions of each element where it was stored (i.e., at 1st^ location the element 20 was stored, at 6th^ location the element 45 was stored etc..) then we can directly access The element by moving to that position in O(1) time. This is the basic idea behind the implementation of hash tables.
Ex 1: For example, I used an array with size 10 to store the roll no.‟s of 10 students. R.no. 1 was placed at 1st location, R.no. 2 was placed at 2nd^ location, R.no. 3 was placed at 3rd^ location and so on ..
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Here I can access R.no. 9 directly from the 9th^ location, Rno:6 directly from 6th^ location. So for accessing the elements always it takes O(1) time. But in real cases this is not possible.
Ex 2: We have a list of employees of a fairly small company. Each of 100 employees has an ID number in the range 0 to 99. If we store the elements (employee records) in the array, then each employee‟s ID number will
can be accessed in O(1) time.
In this case once we know the ID number of the employee, we can directly access his record through the array index. There is a one-to-one correspondence between the element‟s key and the array index. In this an employee details can be accessed in O(1) time.
However, in practice, this perfect relationship is not easy to establish or maintain.
numbers are from 00000 to 99999. If we want to use the same technique as above, we need to set up an array of size 100,000, of which only 100 elements will be used (only 100 employees are working in that company).
h(x2)) then it is called a collision.
Ex: Assume a hash function = h(k) = k mod 10 h(19)=19 mod 10= h(39)=39 mod 10= here h(19)=h(39) This is called collision.
Collision resolution is the most important issue in hash table implementations. To resolve the collisions two
techniques are there.
Perfect Hash Function is a function which, when applied to all the members of the set of items to be stored in a hash table, produces a unique set of integers within some suitable range. Such function produces no collisions. Good Hash Function minimizes collisions by spreading the elements uniformly throughout the array.
There is no magic formula for the creation of the hash function. It can be any mathematical transformation that produces a relatively random and unique distribution of values within the address space of the storage. Although the development of a hash function is trial and error, here are some hints that may make process easier:
Set the size of the storage space to a prime number. This will help generate a more uniform distribution of addresses. Use modulo arithmetic (%). Transform a key in such a way that you can perform X % TABLE_SIZE to generate the addresses To transform a numeric key, try something like adding the digits together or picking every other digit. To transform a string key, try to add up the ASCII codes of the characters in the string and then perform modulo division.
(1) Open Hashing (OR) Separate Chaining :
In this case hash table is implemented as an array of linked lists. Every element of the table is a pointer to a list. The list (chain) will contain all the elements with the same index produced by the hash function. In this technique the array does not hold elements but it holds the addresses of lists that were attached
Each position in the array contains a Collection of values of unlimited size (we use a linked implementation of some sort, with dynamically allocated storage).
Here we will chain all collisions in lists attached to the appropriate slot. This allows an unlimited number of
collisions to be handled and doesn't require a priori knowledge of how many elements are contained in the
collection. The tradeoff is the same as with linked lists versus array implementations of collections: linked list overhead in
space and, to a lesser extent, in time. Let U be the universe of keys. The Keys may be integers, Character strings, Complex bit patterns
B the set of hash values (also called the buckets or bins). Let B = {0, 1,..., m - 1}where m > 0 is a positive integer.
A hash function h: U B associates buckets (hash values) to keys.
Ex: Collision Resolution by Chaining
Put all the elements that hash to the same value in a linked list. See Figure 1.1.
Example 1: See Figure 1.2. Consider the keys 0, 1, 2,4, 16, 25, 36, 49, 64, 81, 100. Let the hash function be: h(x) = x % 7
Figure 1.2: Open hashing: An example
Here list (chain) can be a sorted list or an unsorted list. If it is sorted list then the operations are easy.
Operations of Open Hashing or Separate Chaining: Search (x, T): Search for an element x in the list T[h(key (x))] Insert (x, T) : Insert x at the head of list T[h(key (x))] Delete (x, T) : Delete x from the list T[h(key (x))]
(a) Best case: To search a Key K, first find out h(k). Assume time to compute h(k) is O(1) and the Key „K‟ is available as the first node then the complexity is O(1). (b) Average case: Given hash table T with m slots holding n elements, Load factor =n/m = average keys per slot.
Any key not already in the table is equally likely to hash to any of the m slots. To search unsuccessfully for any key k , need to search to the end of the list T[h(k)], whose expected length is α. Adding the time to compute the hash function(1), the total time required is O(1+α). Search time for unsuccessful search is O(1+ )
The probability that a list is searched is proportional to the number of elements it contains. Assume that the element being searched for is equally likely to be any of the n elements in the table. The number of elements examined during a successful search for an element x is 1 more than the number of elements that appear before x in x‟s list. » These are the elements inserted after x was inserted. » Goal: Find the average, over the n elements x in the table, of how many elements were inserted into x‟s list after x was inserted. Let xi be the ith^ element inserted into the table, and let ki = key[xi]. Define indicator random variables Xij = I{h(ki) = h(kj)}, for all i, j. Simple uniform hashing Pr{h(ki) = h(kj)} = 1/m E[Xij] = 1/m.
n
i
n
j i
Xij n
E 1 1
1
1
n
i
n
i
n
i
n
i
n
ji
n
i
n
ji
ij
n
i
n
ji
ij
2
1 1
1
1 1
1 1
1 1
Expected total time for a successful search = Time to compute hash function + Time to search
If n = O(m), then =n/m = O(m)/m = O(1).
A successful search takes expected time O(1+α). Searching takes constant time on average. Insertion is O(1) in the worst case. Deletion takes O(1) worst-case time when lists are doubly linked. Hence, all dictionary operations take O(1) time on average with hash tables with chaining. In the average case, the running time is O(1 + ),
It is assumed that the hash value h(k) can be computed in O(1) time. If n is O(m), the average case complexity of these operations becomes O(1)!
CLOSED HASHING (OR) OPEN ADDRESSING:
Rehashing is resolving a collision by computing a new hash location (index) in the array. Re-hashing schemes use a second hashing operation when there is a collision. If there is a further collision, we re-hash until an empty "slot" in the table is found. The re-hashing function can either be a new function or a re-application of the original one. As long as the functions are applied to a key in the same order, then a sought key can always be located.
Closed Hashing (or) Open Addressing:
In open addressing all elements stored in hash table itself. Each slot of a hash table contains either a key or NIL. It shows the way, when collisions occur, use a systematic (consistent) procedure to store elements in free slots of the table. Open addressing is the standard hash table implementation With open addressing , we store only one element per location, and handle collisions by storing the extra elements in other unused locations in the array. To find these other locations, we fix some probe sequence that tells us where to look if A[h(x)] contains an element that is not x. It is based on of resolving collisions by probing for free slots The hash formula determines the length and complexity of these probes.
Open addressing provides 3 different probing techniques.
Insert 33: h(33)=33 mod 5= 3 ( so place 33 at 3rd^ location) 0 1 2 3 4 20 44 Empty 33 54
Insert 21: h(21)=21 mod 5= 1 (But at 1st^ location already an element is there ) h(21,1)=(h(21)+1)mod5=(1+1)mod5= 2 (2nd^ location is empty so place 21 at 2nd^ position) 0 1 2 3 4 20 44 21 33 54 Algorithm for Linear Probing Hashing: Insertion
void insert( key, r[]) { int n; int i, last; i = hashfunction( key ) ; /computes h(x)/** last = (i+m-1) % m; while ( i!=last && !empty(r[i]) && !deleted(r[i]) && r[i]!=key ) i = (i+1) % m; if (empty(r[i]) || deleted(r[i])) r[i] = key; /*** insert here / else Error / table full, or key already in table ***/; }
(b) Search(x,T): Search operation searches for an element x in hash table „T‟ and returns
“search was successful” if the element was found. Other wise it returns “search was unsuccessful”.
Procedure:
First evaluates h(x) and examine slot h(x). Examining a slot is known as a probe. If slot h(x) contains key x, the search is successful. If the slot contains NIL, the search is unsuccessful. There‟s a third possibility: slot h(x) contains a key that is not x. Compute the index of some other slot, based on x and which probe we are on.(Apply linear probing) Keep probing until we either find key k or we find a slot holding NIL_._
int search( key, r[] ) { int i, last; i = hashfunction( key ); /computes h(x)/** last = (i+n-1) % m; while ( i!=last && !empty(r[i]) && r[i]!=key ) i = (i+1) % m; if (r[i]==key) return( i ); else return( - 1 ); }
(c) Delete(x,T): This operation deletes the key x from hash table „T‟ Procedure:
First evaluates h(x) and examine slot h(x). Examining a slot is known as a probe.
If slot h(x) contains key x, then delete the element and make that location empty. Otherwise apply linear probing to locate the element. After linear probing also if the element was not found then display that “Deletion is impossible because the element was not in the table”.
Advantages : All the elements (or pointer to the elements) are placed in contiguous storage. This will speed up the sequential searches when collisions do occur. It Avoids pointers;
Disadvantages : Linear probing suffers from primary clustering problem Clustering: Element tend to cluster around elements that produce collisions. As the array fills, there will be gaps of unused locations. Suffers from primary clustering : o Long runs of occupied sequences build up. o Long runs tend to get longer, since an empty slot preceded by i full slots gets filled next with probability ( i +1)/ m. o Hence, average search and insertion times increase. As the number of collisions increases, the distance from the array index computed by the hash function and the actual location of the element increases, increasing search time. The hash table has a fixed size. At some point all the elements in the array will be filled. The only alternative at that point is to expand the table, which also means modify the hash function to accommodate the increased address space.
(a) Insertion(x,T): This operation inserts an element x into a hash table T, while inserting an element if there is a collision it applies Quadratic Probing.
Procedure:
First it evaluates h(x) if h(x) location is empty, then it places x into h(x). If there is a collision then it applies Quadratic Probing and locates another slot and if This slot is empty then it places the „x‟ into that slot , Other wise it probes to another location, this procedure is repeated until an empty slot is found. If there is no empty location in hash table then insertion is not possible.
h(x, i) = (h(x) + i^2 ) mod m where m is the hash table size and i = 0, 1, 2,... , m-
Insert 21: h(21)=21mod 10= 1 (place 21 at 1st^ location)
0 1 2 3 4 5 6 7 8 9 49 21 58 E E E E 28 18 59
Insert 33: h(33)=33mod 10= 3 (place 33 at 3rd^ location)
0 1 2 3 4 5 6 7 8 9 49 21 58 33 E E E 28 18 59
(b) Search(x,T): Search operation searches for an element x in hash table „T‟ and returns “search was
successful” if the element was found. Other wise it returns “search was unsuccessful”.
Procedure:
First evaluates h(x) and examine slot h(x). Examining a slot is known as a probe. If slot h(x) contains key x, the search is successful. If the slot contains NIL, the search is unsuccessful. There‟s a third possibility: slot h(x) contains a key that is not x. Compute the index of some other slot, based on x and which probe we are on.(Apply Quadratic probing) Keep probing until we either find key k or we find a slot holding NIL_._ int search( key, r[] ) { int i, last; i = hashfunction( key ); /computes h(x)/** last = (i+n-1) % m; while ( i!=last && !empty(r[i]) && r[i]!=key ) i = (ii+1) % m;* if (r[i]==key) return( i ); else return( - 1 ); }
(c) Delete(x,T): This operation deletes the key x from hash table „T‟
Procedure:
First evaluates h(x) and examine slot h(x). Examining a slot is known as a probe. If slot h(x) contains key x, then delete the element and make that location empty. Otherwise apply Quadratic probing to locate the element. After Quadratic probing also if the element was not found then display that “Deletion is impossible because the element was not in the table”. Disadvantage : Can suffer from secondary clustering: If two keys have the same initial probe position, then their probe sequences are the same.
3. Double Hashing: It uses two different hash functions
h 1 (x,i) = (h 1 (x) + i h 2 (x)) mod m where h 1 (x) is the first hash function and h 2 (x) is the second hash function, m is the hash table size, i=1,2,3,4 etc.. Here h 1 and h 2 are two auxiliary hash functions. h 1 gives the initial probe. h 2 gives the remaining probes. In This h 2 (x) must be a relatively prime to m, so that the probe sequence is a full permutation of 0, 1,…, m– 1 . Choose m to be a power of 2 and have h 2 ( x ) always return an odd number. Or, Let m be prime, and have 1 < h 2 ( x ) < m. Suppose h 1 (x )=x mod m then h 2 can be selected as h 2 (x) =(R-x mod R) where R is a prime number nearer to m
Insert(x,T): This function inserts the key value „x‟ into hash table ‟T‟.
Procedure:
First it evaluates h 1 (x) if h 1 (x) location is empty, then it places x into h 1 (x). If there is a collision then it applies Quadratic Probing and locates another slot and if This slot is empty then it places the „x‟ into that, Other wise it probes to another location, this procedure is repeated until an empty slot is found. If there is no empty location in hash table then insertion is not possible.
For example consider a hash table with size 10. And insert the elements 59,18,49,58, 21,33 into the list.(E-
indicates empty).
0 1 2 3 4 5 6 7 8 9 E E E E E E E E E E
Insert 59: h 1 (59)=59mod 10= 9 (place 59 at 9th^ location
0 1 2 3 4 5 6 7 8 9 E E E E E E E E E 59
Insert 18: h 1 (18)=18mod 10= 8 (place 18 at 8th^ location).
0 1 2 3 4 5 6 7 8 9 E E E E E E E E 18 59
Insert 49: h 1 (49)=49mod 10= 9 (place 49 at 9th^ location. But element is there so apply D.H).
h 1 (49,T)=(h 1 (49)+1.h 2 (49))mod Where h 2 (49)= R- (x mod R)=(7- 49mod 7)=7-0=7(select R as 7 which is a prime nearer to m )
h 1 (49,T)=(h 1 (49)+1.7))mod10=(9+7) mod 10= 6 (Place 49 at 6th^ position).
0 1 2 3 4 5 6 7 8 9 E E E E E E 49 E 18 59
int search( key, r ) { int i, inc, last;
i = hash1( key ) ; inc = hash2( key ); last = (i+(n-1)*inc) % m; while ( i!=last && !empty(r[i]) && r[i]!=key ) i = (i+inc) % m; if (r[i]==key) return( i ); else return( - 1 ); } //Where hash1 (key) is the first hash function,hash2(key) is the second hash function
(c) Delete(x,T): This operation deletes the key x from hash table „T‟
Procedure:
First compute h 1 (x) value and examine slot h 1 (x). Examining a slot is known as a probe. If slot h 1 (x) contains key x, then delete the element and make that location empty. Otherwise apply Double Hashing to locate the element. After probing also if the element was not found then display that “Deletion is impossible because the element was not in the table”.
Advantages: Distributes keys more uniformly than linear probing
A Comparison of Rehashing Methods
m distinct probe Primary clustering
sequences
m distinct probe No primary clustering;
sequences but secondary clustering
m^2 distinct probe No primary clustering
sequences No secondary clustering
Choosing a good hashing function, h(k) , is essential for hash-table based searching. h should distribute the elements of our collection as uniformly as possible to the "slots" of the hash table. The key criterion is that there should be a minimum number of collisions.
If the probability that a key, k , occurs in our collection is P(k) , then if there are m slots in our hash table, a uniform hashing function , h(k) , would ensure:
Sometimes, this is easy to ensure. For example, if the keys are randomly distributed in (0, r ], then,
h(k) = floor((mk)/r) will provide uniform hashing.
Mapping keys to natural numbers
Most hashing functions will first map the keys to some set of natural numbers, say (0,r]. There are many ways
to do this, for example if the key is a string of ASCII characters, we can simply add the ASCII representations
of the characters mod 255 to produce a number in (0,255) - or we could xor them, or we could add them in
pairs mod 2^16 - 1, or ...
Having mapped the keys to a set of natural numbers, we then have a number of possibilities.
h(k) = k mod m.
When using this method, we usually avoid certain values of m. Powers of 2 are usually avoided, for k mod 2 b^ simply selects the b low order bits of k. Unless we know that all the 2 b^ possible values of the lower order bits are equally likely, this will not be a good choice, because some bits of the key are not used in the hash function.
Prime numbers which are close to powers of 2 seem to be generally good choices for m.
For example, if we have 4000 elements, and we have chosen an overflow table organization, but wish to have the probability of collisions quite low, then we might choose m = 4093. (4093 is the largest prime less than 4096 = 2^12 .)
2. MULTIPLICATION METHOD:
o Multiply the key by a constant A , 0 < A < 1, o Extract the fractional part of the product, o Multiply this value by m.
Thus the hash function is:
h(k) = floor(m * (kA - floor(kA)))
In this case, the value of m is not critical and we typically choose a power of 2 so that we can get the following efficient procedure on most digital computers:
o Choose m = 2 p.