Data Structure and Algorithm: Searching Techniques and Hash Functions, Study notes of Data Structures and Algorithms

An overview of data structure and algorithm concepts, specifically focusing on searching techniques such as linear search and binary search, and hash functions. The process of searching for an element in a collection, the advantages of different searching techniques, and the concept of hash functions for indexing records. It also discusses hash collisions and methods for handling them, including open addressing and chaining.

Typology: Study notes

2014/2015

Uploaded on 01/16/2015

Raju.Rana
Raju.Rana 🇳🇵

2 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data$structure$and$Algorithm$(BEIT$III$and$SE$III)$
Nepal$College$of$Information$Technology$
Prepared$by:$Madan$Kadariya$
Page 1 of 6!
Chapter 8: Searching
Searching is a process of checking and finding an element from a list of elements. Let A be a
collection of data elements, i.e., A is a linear array of say n elements. If we want to find the presence
of an element “data” in A, then we have to search for it. The search is successful if data does appear in
A and unsuccessful if otherwise. There are several types of searching techniques; one has some
advantage(s) over other. Following are the three important searching techniques:
1. Linear or Sequential Searching
2. Binary Searching
3. Hashing
The records that are stored in a list being searched must conform to the following minimal standards:
Every record is associated to a key.
Keys can be compared for equality or relative ordering.
Records can be compared to each other or to keys by first converting records to their
associated keys.
!
1. Linear or Sequential Search
In linear search, each element of an array is read one by one sequentially and it is compared with the
desired element. A search will be unsuccessful if all the elements are read and the desired element is
not found.
Algorithm for Linear Search
Let A be an array of n elements, A [1], A[2],A[3], ...... A[n]. “data” is the element to be searched.
Then this algorithm will find the data and display the location if present otherwise display data not
found.
1. Input an array A of n elements and “data” to be searched and initialize flag =0.
2. Initialize i = 0; and repeat through step 3 if (i < n) by incrementing i by one .
3. If (data == A[i])
i. Display data is found at location i
ii. Flag = 1
iii. Return
4. If (flag == 0)
i. Display “data is not found and searching is unsuccessful”
5. Exit
Source Code:
void search(int arr[], int n)
{
int flag = 0;
int i,data;
printf("Enter Data to be search \n");
scanf("%d",&data);
for(i=0;i<n;i++)
{
if(arr[i] == data)
{
printf("\n %d data is found at location %d\n",data,i);
flag = 1;
break;
}
}
if(flag == 0)
printf("\n %d data is not found in an array",data);
}
2. Binary Search
Binary search is an extremely efficient algorithm when it is compared to linear search. Binary search
technique searches “data” in minimum possible comparisons. Suppose the given array is a sorted one,
otherwise first we have to sort the array elements.
pf3
pf4
pf5

Partial preview of the text

Download Data Structure and Algorithm: Searching Techniques and Hash Functions and more Study notes Data Structures and Algorithms in PDF only on Docsity!

Nepal College of Information Technology Chapter 8: Searching Searching is a process of checking and finding an element from a list of elements. Let A be a collection of data elements, i.e., A is a linear array of say n elements. If we want to find the presence of an element “data” in A, then we have to search for it. The search is successful if data does appear in A and unsuccessful if otherwise. There are several types of searching techniques; one has some advantage(s) over other. Following are the three important searching techniques:

  1. Linear or Sequential Searching
  2. Binary Searching
  3. Hashing The records that are stored in a list being searched must conform to the following minimal standards:
    • Every record is associated to a key.
    • Keys can be compared for equality or relative ordering.
    • Records can be compared to each other or to keys by first converting records to their associated keys. 1. Linear or Sequential Search In linear search, each element of an array is read one by one sequentially and it is compared with the desired element. A search will be unsuccessful if all the elements are read and the desired element is not found. Algorithm for Linear Search Let A be an array of n elements, A [1], A[2],A[3], ...... A[n]. “data” is the element to be searched. Then this algorithm will find the data and display the location if present otherwise display data not found.
    1. Input an array A of n elements and “data” to be searched and initialize flag =0.
    2. Initialize i = 0; and repeat through step 3 if (i < n) by incrementing i by one.
    3. If (data == A[i]) i. Display data is found at location i ii. Flag = 1 iii. Return
    4. If (flag == 0) i. Display “data is not found and searching is unsuccessful”
    5. Exit Source Code: void search(int arr[], int n) { int flag = 0 ; int i,data; printf("Enter Data to be search \n"); scanf("%d",&data); for(i= 0 ;i<n;i++) { if(arr[i] == data) { printf("\n %d data is found at location %d\n",data,i); flag = 1 ; break; } } if(flag == 0 ) printf("\n %d data is not found in an array",data); } 2. Binary Search Binary search is an extremely efficient algorithm when it is compared to linear search. Binary search technique searches “data” in minimum possible comparisons. Suppose the given array is a sorted one, otherwise first we have to sort the array elements.

Nepal College of Information Technology Then apply the following conditions to search a “data”.

  1. Find the middle element of the array (i.e., n/2 is the middle element if the array or the sub- array contains n elements).
  2. Compare the middle element with the data to be searched, then there are following three cases. i. If it is a desired element, then search is successful. ii. If it is less than desired data, then search only the first half of the array, i.e., the elements which come to the left side of the middle element. iii. If it is greater than the desired data, then search only the second half of the array, i.e., the elements which come to the right side of the middle element.
  3. Repeat the same steps until an element is found or exhaust the search area. Algorithm for Binary Search Let A be an array of n elements.”Data” is an element to be searched. “mid” denotes the middle location of a segment (or array or sub-array) of the element of A. LB and UB is the lower and upper bound of the array which is under consideration. Search in the 1 st half of the array mid value Search in the 2 nd half of the array First Value (first + last) /2 Last Value
  4. Input an array A of n elements and “data” to be sorted.
  5. LB = 0, UB = n; mid = int ((LB+UB)/2)
  6. Repeat step 4 and 5 while (LB <= UB) and (A[mid]! = data)
  7. If (data < A[mid]) i. UB = mid– 1
  8. Else i. LB = mid + 1
  9. Mid = int ((LB + UB)/2)
  10. If (A[mid]== data) i. Display “the data found”
  11. Else i. Display “the data is not found”
  12. Exit Suppose we have an array of 7 elements. Following steps are generated if we binary search a data = 45 from the above array. Fig: Step 1 Fig: Step 2 Step 1: LB = 0; UB = 6 mid = (0 + 6)/2 = 3 A[mid] = A[3] = 30 Step 2: Since (A[3] < data) - i.e., 30 < 45 - reinitialise the variable LB, UB and mid. LB = 3 UB = 6 mid = (3 + 6)/2 = 4 A[mid] = A[4] = 40 Step 3: Since (A[4] < data) - i.e., 40 < 45 - reinitialize the variable LB, UB and mid (Fig Below). LB = 4 UB = 6 mid = (4 + 6)/2 = 5 A[mid] = A[5] = 45

Nepal College of Information Technology Hash Table So if you enter the employee code to the hash function, we can directly retrieve TABLE[H( k )] details directly. Note that if the memory address begins with 01- m instead of 00- m , then we have to choose the hash function H( k ) = k (mod m )+1. 3.2. Mid Square Method The key k is squared. Then the hash function H is defined by H(k) = k^2 = l Where l is obtained by digits from both the end of k 2 starting from left. Same number of digits must be used for all of the keys. For example consider following keys in the table and its hash index : Hash Table with Mid Square Division 3.3. Folding Method The key K, K 1 , K 2 ,...... Kr is partitioned into number of parts. The parts have same number of digits as the required hash address, except possibly for the last part. Then the parts are added together, ignoring the last carry. That is H(K) = K 1 + K 2 + ...... + Kr Here we are dealing with a hash table with index form 00 to 99, i.e , two-digit hash table. So we divide the K numbers of two digits. Extra milling can also be applied to even numbered parts, K 2 , K 4 , ...... are each reversed before the addition(second table in above). H(7148) = 71 + 64 = 155, here we will eliminate the leading carry (i.e., 1). So H(7148) = 71 + 64 = 55.

Nepal College of Information Technology 3.4. Hash Collision and Handling of Hash Collision It is possible that two non-identical keys K 1 , K 2 are hashed into the same hash address. This situation is called Hash Collision. Let us consider a hash table having 10 locations as in table. Division Method is used to hash the key. H(K) = K (mod) m ; Here m is chosen as 10. The Hash function produces any integer between 0 and 9 inclusions, depending on the value of the key. If we want to insert a new record with key 500 then H(500) = 500(mod 10) = 0. The location 0 in the table is already filled (i.e., not empty). Thus collision occurred. Collisions are almost impossible to avoid but it can be minimized considerably by introducing any one of the following three techniques: Table

  1. Open addressing
  2. Chaining
  3. Bucket addressing 3.4.1. Open Addressing (Linear Probing) In open addressing method, when a key is colliding with another key, the collision is resolved by finding a nearest empty space by probing the cells. Suppose a record R with key K has a hash address H( k ) = h. then we will linearly search h + i (where i = 0, 1, 2, ...... m) locations for free space (i.e., h, h
  • 1, h + 2, h + 3 ...... hash address). To understand the concept, let us consider a hash collision which is in the hash table Table1. If we try to insert a new record with a key 500 then H(500) = 500(mod 10) = 0. The array index 0 is already occupied by H(210). With open addressing we resolve the hash collision by inserting the record in the next available free or empty location in the table. Here the key 111 also occupies next location, i.e., array hash index 1. Next available free location in the table is array index 2 and we place the record in this free location. The position in which a key can be stored is found by sequentially searching all positions starting from the position calculated by the hash function until an empty cell is found. This type of probing is called Linear Probing. The main disadvantage of Linear Probing is that substantial amount of time will take to find the free cell by sequential or linear searching the table. i. Quadratic Probing Suppose a record with R with key k has the hash address H(K) = h. Then instead of searching the location with address h, h + 1, h + 2,...... h + i ......, we search for free hash address h, h + 1, h + 4, h + 9, h + 16, ...... h + i^2 ,...... ii. Double Hashing Second hash function H1 is used to resolve the collision. Suppose a record R with key k has the hash address H(K) = h and H1(K) = h1, which is not equal to m. Then we linearly search for the location with addresses h, h + h^1 , h + 2h^1 , h + 3h^1 , ...... h + i (h^1 )^2 (where i = 0, 1, 2, ......). Note: The main drawback of implementing any open addressing procedure is the implementation of deletion. Location Keys Records 0 210 1 111 2 3 883 4 344 5 6 7 8 488 9