Algorithm notes computer science, Study notes of Design and Analysis of Algorithms

By surendra udainiya Bsc computer science delhi university

Typology: Study notes

2019/2020

Uploaded on 01/28/2020

surendra-udeniya
surendra-udeniya 🇮🇳

5

(1)

1 document

1 / 158

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
TutorialsDuniya.com
Design and Analysis of
Algorithms Notes
Visit https://www.tutorialsduniya.com for
Compiled books, Notes, books, programs,
question papers with solutions etc.
Facebook: https://www.facebook.com/tutorialsduniya
Youtube: https://www.youtube.com/user/TutorialsDuniya
LinkedIn: https://www.linkedin.com/company/tutorialsduniya
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Algorithm notes computer science and more Study notes Design and Analysis of Algorithms in PDF only on Docsity!

TutorialsDuniya.com

Design and Analysis of

Algorithms Notes

Visit https://www.tutorialsduniya.com for

Compiled books, Notes, books, programs,

question papers with solutions etc.

Facebook: https://www.facebook.com/tutorialsduniya

Youtube: https://www.youtube.com/user/TutorialsDuniya

LinkedIn: https://www.linkedin.com/company/tutorialsduniya

TutorialsDuniya.com

Get FREE Compiled Books, Notes, Programs, Books, Question Papers with Solution*

etc of following subjects from https://www.tutorialsduniya.com.

C and C++Computer System Architecture

Programming in JavaDiscrete Structures

Data StructuresOperating Systems

Computer NetworksAlgorithms

Android ProgrammingDataBase Management Systems

PHP ProgrammingSoftware Engineering

JavaScriptTheory of Computation

Java Server PagesOperational Research

PythonSystem Programming

MicroprocessorData Mining

Artificial IntelligenceComputer Graphics

Machine LearningData Science

Compiled Books: https://www.tutorialsduniya.com/compiled-books

Programs: https://www.tutorialsduniya.com/programs

Question Papers: https://www.tutorialsduniya.com/question-papers

Python Notes: https://www.tutorialsduniya.com/python

Java Notes: https://www.tutorialsduniya.com/java

JavaScript Notes: https://www.tutorialsduniya.com/javascript

JSP Notes: https://www.tutorialsduniya.com/jsp

Microprocessor Notes: https://www.tutorialsduniya.com/microprocessor

OR Notes: https://www.tutorialsduniya.com/operational-research

DICTIONARY

A dictionary is a container of elements from a totally ordered universe that supports the basic operations of inserting/deleting elements and searching for a given element. (OR)

Dictionary is a Dynamic-set data structure for storing items indexed using keys. It Supports operations Insert, Search, and Delete.

Ex: hash tables are dictionaries which provide an efficient implicit realization of a dictionary. Efficient explicit implementations include binary search trees and balanced search trees.

Dictionaries: A dictionary is a dynamic set ADT with the operations:

  1. Makenull (D)
  2. Insert (x, D)
  3. Delete (x, D)
  4. Search (x, D)

Dictionaries are useful in implementing symbol table of a compiler, text retrieval systems, database systems, page mapping tables, Large-scale distributed systems etc.

Dictionaries can be implemented with:

1. Fixed Length arrays 2. Linked lists: sorted, unsorted, skip-lists 3. Hash Tables: open, closed

4. Trees: Binary Search Trees (BSTs), Balanced BSTs like AVL Trees, Red-Black Trees Splay Trees, Multi way Search Trees like 2-3 Trees , B Trees, 5. Tries

 Let n be the number of elements in a dictionary D. The following is a summary of the performance of some basic implementation methods:

. Search Delete Insert Min Array O(n) O(n) O(n) O(n) Sorted linked list O(n) O(n) O(n) O(1) Unsorted linked list O(n) O(n) O(n) O(n)

 Among these, the sorted list has the best average case performance  Arrays, sorted linked lists, unsorted linked lists all takes O(n) time for insert, delete, Search, min operations

HASH TABLES

Necessity of Hash tables:

We can access any position of an array in constant time (i.e., in O(1)). We think of the subscript as the key, and the value stored in the array as the data. Given the key, we can access the data in constant time.

0 1 2 3 4 5 6 7 8 9 10 20 25 30 35 40 45 50 55 60

Generally search operation in an array takes O(n) time. But if we know the positions of each element where it was stored (i.e., at 1st^ location the element 20 was stored, at 6th^ location the element 45 was stored etc..) then we can directly access The element by moving to that position in O(1) time. This is the basic idea behind the implementation of hash tables.

Ex 1: For example, I used an array with size 10 to store the roll no.‟s of 10 students. R.no. 1 was placed at 1st location, R.no. 2 was placed at 2nd^ location, R.no. 3 was placed at 3rd^ location and so on ..

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

Here I can access R.no. 9 directly from the 9th^ location, Rno:6 directly from 6th^ location. So for accessing the elements always it takes O(1) time. But in real cases this is not possible.

Ex 2: We have a list of employees of a fairly small company. Each of 100 employees has an ID number in the range 0 to 99. If we store the elements (employee records) in the array, then each employee‟s ID number will

be an index to the array element where this employee‟s record will be stored. In this table an employee details

can be accessed in O(1) time.

In this case once we know the ID number of the employee, we can directly access his record through the array index. There is a one-to-one correspondence between the element‟s key and the array index. In this an employee details can be accessed in O(1) time.

However, in practice, this perfect relationship is not easy to establish or maintain.

Ex 3: the same company might use employee‟s five-digit ID number as the key. With 5 digits the possible

numbers are from 00000 to 99999. If we want to use the same technique as above, we need to set up an array of size 100,000, of which only 100 elements will be used (only 100 employees are working in that company).

(c) Collisions: If x1 and x2 are two different keys, but the hash values of x1 and x2 are equal (i.e., h(x1) =

h(x2)) then it is called a collision.

Ex: Assume a hash function = h(k) = k mod 10 h(19)=19 mod 10= h(39)=39 mod 10= here h(19)=h(39) This is called collision.

Collision resolution is the most important issue in hash table implementations. To resolve the collisions two

techniques are there.

  1. Open Hashing 2. Closed Hashing

Perfect Hash Function is a function which, when applied to all the members of the set of items to be stored in a hash table, produces a unique set of integers within some suitable range. Such function produces no collisions. Good Hash Function minimizes collisions by spreading the elements uniformly throughout the array.

There is no magic formula for the creation of the hash function. It can be any mathematical transformation that produces a relatively random and unique distribution of values within the address space of the storage. Although the development of a hash function is trial and error, here are some hints that may make process easier:

 Set the size of the storage space to a prime number. This will help generate a more uniform distribution of addresses.  Use modulo arithmetic (%). Transform a key in such a way that you can perform X % TABLE_SIZE to generate the addresses  To transform a numeric key, try something like adding the digits together or picking every other digit.  To transform a string key, try to add up the ASCII codes of the characters in the string and then perform modulo division.

(1) Open Hashing (OR) Separate Chaining :

 In this case hash table is implemented as an array of linked lists.  Every element of the table is a pointer to a list. The list (chain) will contain all the elements with the same index produced by the hash function.  In this technique the array does not hold elements but it holds the addresses of lists that were attached

to every slot.

 Each position in the array contains a Collection of values of unlimited size (we use a linked implementation of some sort, with dynamically allocated storage).

Here we will chain all collisions in lists attached to the appropriate slot. This allows an unlimited number of

collisions to be handled and doesn't require a priori knowledge of how many elements are contained in the

collection. The tradeoff is the same as with linked lists versus array implementations of collections: linked list overhead in

space and, to a lesser extent, in time. Let U be the universe of keys. The Keys may be integers, Character strings, Complex bit patterns

 B the set of hash values (also called the buckets or bins). Let B = {0, 1,..., m - 1}where m > 0 is a positive integer.

A hash function h: U B associates buckets (hash values) to keys.

Ex: Collision Resolution by Chaining

Put all the elements that hash to the same value in a linked list. See Figure 1.1.

Example 1: See Figure 1.2. Consider the keys 0, 1, 2,4, 16, 25, 36, 49, 64, 81, 100. Let the hash function be: h(x) = x % 7

Figure 1.2: Open hashing: An example

Here list (chain) can be a sorted list or an unsorted list. If it is sorted list then the operations are easy.

Operations of Open Hashing or Separate Chaining:  Search (x, T): Search for an element x in the list T[h(key (x))]  Insert (x, T) : Insert x at the head of list T[h(key (x))]  Delete (x, T) : Delete x from the list T[h(key (x))]

(i) Search:

(a) Best case: To search a Key K, first find out h(k). Assume time to compute h(k) is O(1) and the Key „K‟ is available as the first node then the complexity is O(1). (b) Average case: Given hash table T with m slots holding n elements, Load factor =n/m = average keys per slot.

m – number of slots, n – number of elements stored in the hash table.

(i) Unsuccessful Search: Uniform hashing yields an average list length  = n / m

 Any key not already in the table is equally likely to hash to any of the m slots.  To search unsuccessfully for any key k , need to search to the end of the list T[h(k)], whose expected length is α.  Adding the time to compute the hash function(1), the total time required is O(1+α).  Search time for unsuccessful search is O(1+ )

(ii) Successful Search:

 The probability that a list is searched is proportional to the number of elements it contains.  Assume that the element being searched for is equally likely to be any of the n elements in the table.  The number of elements examined during a successful search for an element x is 1 more than the number of elements that appear before x in x‟s list. » These are the elements inserted after x was inserted. » Goal: Find the average, over the n elements x in the table, of how many elements were inserted into x‟s list after x was inserted.  Let xi be the ith^ element inserted into the table, and let ki = key[xi].  Define indicator random variables Xij = I{h(ki) = h(kj)}, for all i, j.  Simple uniform hashing  Pr{h(ki) = h(kj)} = 1/m  E[Xij] = 1/m.

 Expected number of elements examined in a successful search is:

 

 

 

 

 

 

    

n

i

n

j i

Xij n

E 1 1

1

1

n

nn

n

nm

n i

nm

n i

nm

n m

E X

n

X

n

E

n

i

n

i

n

i

n

i

n

ji

n

i

n

ji

ij

n

i

n

ji

ij

1 [ ]

2

1 1

1

1 1

1 1

1 1

 

 

 

 

 

 

 

 

Visit https://www.tutorialsduniya.com for Notes, books, programs, question papers with solutions etc.

Expected total time for a successful search = Time to compute hash function + Time to search

= O(2+  /2 –  /2n) =O(1+  ).

If n = O(m), then =n/m = O(m)/m = O(1).

A successful search takes expected time O(1+α). Searching takes constant time on average.  Insertion is O(1) in the worst case.  Deletion takes O(1) worst-case time when lists are doubly linked.  Hence, all dictionary operations take O(1) time on average with hash tables with chaining.  In the average case, the running time is O(1 + ),

It is assumed that the hash value h(k) can be computed in O(1) time. If n is O(m), the average case complexity of these operations becomes O(1)!

CLOSED HASHING (OR) OPEN ADDRESSING:

Rehashing

 Rehashing is resolving a collision by computing a new hash location (index) in the array.  Re-hashing schemes use a second hashing operation when there is a collision. If there is a further collision, we re-hash until an empty "slot" in the table is found.  The re-hashing function can either be a new function or a re-application of the original one. As long as the functions are applied to a key in the same order, then a sought key can always be located.

Closed Hashing (or) Open Addressing:

 In open addressing all elements stored in hash table itself. Each slot of a hash table contains either a key or NIL.  It shows the way, when collisions occur, use a systematic (consistent) procedure to store elements in free slots of the table.  Open addressing is the standard hash table implementation  With open addressing , we store only one element per location, and handle collisions by storing the extra elements in other unused locations in the array.  To find these other locations, we fix some probe sequence that tells us where to look if A[h(x)] contains an element that is not x.  It is based on of resolving collisions by probing for free slots  The hash formula determines the length and complexity of these probes.

Open addressing provides 3 different probing techniques.

  1. Linear Probing
  2. Quadratic Probing
  3. Double Hashing

Insert 33: h(33)=33 mod 5= 3 ( so place 33 at 3rd^ location) 0 1 2 3 4 20 44 Empty 33 54

Insert 21: h(21)=21 mod 5= 1 (But at 1st^ location already an element is there ) h(21,1)=(h(21)+1)mod5=(1+1)mod5= 2 (2nd^ location is empty so place 21 at 2nd^ position) 0 1 2 3 4 20 44 21 33 54 Algorithm for Linear Probing Hashing: Insertion

void insert( key, r[]) { int n; int i, last; i = hashfunction( key ) ; /computes h(x)/** last = (i+m-1) % m; while ( i!=last && !empty(r[i]) && !deleted(r[i]) && r[i]!=key ) i = (i+1) % m; if (empty(r[i]) || deleted(r[i])) r[i] = key; /*** insert here / else Error / table full, or key already in table ***/; }

(b) Search(x,T): Search operation searches for an element x in hash table „T‟ and returns

“search was successful” if the element was found. Other wise it returns “search was unsuccessful”.

Procedure:

 First evaluates h(x) and examine slot h(x). Examining a slot is known as a probe.  If slot h(x) contains key x, the search is successful. If the slot contains NIL, the search is unsuccessful.  There‟s a third possibility: slot h(x) contains a key that is not x.  Compute the index of some other slot, based on x and which probe we are on.(Apply linear probing)  Keep probing until we either find key k or we find a slot holding NIL_._

Algorithm for Linear Probing Hashing: Search

int search( key, r[] ) { int i, last; i = hashfunction( key ); /computes h(x)/** last = (i+n-1) % m; while ( i!=last && !empty(r[i]) && r[i]!=key ) i = (i+1) % m; if (r[i]==key) return( i ); else return( - 1 ); }

(c) Delete(x,T): This operation deletes the key x from hash table „T‟ Procedure:

 First evaluates h(x) and examine slot h(x). Examining a slot is known as a probe.

 If slot h(x) contains key x, then delete the element and make that location empty. Otherwise apply linear probing to locate the element. After linear probing also if the element was not found then display that “Deletion is impossible because the element was not in the table”.

Advantages :  All the elements (or pointer to the elements) are placed in contiguous storage. This will speed up the sequential searches when collisions do occur.  It Avoids pointers;

Disadvantages : Linear probing suffers from primary clustering problem  Clustering: Element tend to cluster around elements that produce collisions. As the array fills, there will be gaps of unused locations. Suffers from primary clustering : o Long runs of occupied sequences build up. o Long runs tend to get longer, since an empty slot preceded by i full slots gets filled next with probability ( i +1)/ m. o Hence, average search and insertion times increase.  As the number of collisions increases, the distance from the array index computed by the hash function and the actual location of the element increases, increasing search time.  The hash table has a fixed size. At some point all the elements in the array will be filled. The only alternative at that point is to expand the table, which also means modify the hash function to accommodate the increased address space.

  1. Quadratic Probing : is a different way of rehashing. In the case of quadratic probing we are still looking for an empty location. However, instead of incrementing offset by 1 every time, as in linear probing, we will increment the offset by 1, 4,9, 16, ... We explore a sequence of location until an empty one is found as follows:

(a) Insertion(x,T): This operation inserts an element x into a hash table T, while inserting an element if there is a collision it applies Quadratic Probing.

Procedure:

 First it evaluates h(x) if h(x) location is empty, then it places x into h(x).  If there is a collision then it applies Quadratic Probing and locates another slot and if This slot is empty then it places the „x‟ into that slot , Other wise it probes to another location, this procedure is repeated until an empty slot is found.  If there is no empty location in hash table then insertion is not possible.

h(x, i) = (h(x) + i^2 ) mod m where m is the hash table size and i = 0, 1, 2,... , m-

Insert 21: h(21)=21mod 10= 1 (place 21 at 1st^ location)

0 1 2 3 4 5 6 7 8 9 49 21 58 E E E E 28 18 59

Insert 33: h(33)=33mod 10= 3 (place 33 at 3rd^ location)

0 1 2 3 4 5 6 7 8 9 49 21 58 33 E E E 28 18 59

(b) Search(x,T): Search operation searches for an element x in hash table „T‟ and returns “search was

successful” if the element was found. Other wise it returns “search was unsuccessful”.

Procedure:

 First evaluates h(x) and examine slot h(x). Examining a slot is known as a probe.  If slot h(x) contains key x, the search is successful. If the slot contains NIL, the search is unsuccessful.  There‟s a third possibility: slot h(x) contains a key that is not x.  Compute the index of some other slot, based on x and which probe we are on.(Apply Quadratic probing)  Keep probing until we either find key k or we find a slot holding NIL_._ int search( key, r[] ) { int i, last; i = hashfunction( key ); /computes h(x)/** last = (i+n-1) % m; while ( i!=last && !empty(r[i]) && r[i]!=key ) i = (ii+1) % m;* if (r[i]==key) return( i ); else return( - 1 ); }

(c) Delete(x,T): This operation deletes the key x from hash table „T‟

Procedure:

 First evaluates h(x) and examine slot h(x). Examining a slot is known as a probe.  If slot h(x) contains key x, then delete the element and make that location empty. Otherwise apply Quadratic probing to locate the element. After Quadratic probing also if the element was not found then display that “Deletion is impossible because the element was not in the table”.  Disadvantage : Can suffer from secondary clustering: If two keys have the same initial probe position, then their probe sequences are the same.

3. Double Hashing: It uses two different hash functions

h 1 (x,i) = (h 1 (x) + i h 2 (x)) mod m where h 1 (x) is the first hash function and h 2 (x) is the second hash function, m is the hash table size, i=1,2,3,4 etc..  Here h 1 and h 2 are two auxiliary hash functions. h 1 gives the initial probe. h 2 gives the remaining probes.  In This h 2 (x) must be a relatively prime to m, so that the probe sequence is a full permutation of 0, 1,…, m– 1 .  Choose m to be a power of 2 and have h 2 ( x ) always return an odd number. Or, Let m be prime, and have 1 < h 2 ( x ) < m.  Suppose h 1 (x )=x mod m then h 2 can be selected as h 2 (x) =(R-x mod R) where R is a prime number nearer to m

Insert(x,T): This function inserts the key value „x‟ into hash table ‟T‟.

Procedure:

 First it evaluates h 1 (x) if h 1 (x) location is empty, then it places x into h 1 (x).  If there is a collision then it applies Quadratic Probing and locates another slot and if This slot is empty then it places the „x‟ into that, Other wise it probes to another location, this procedure is repeated until an empty slot is found.  If there is no empty location in hash table then insertion is not possible.

For example consider a hash table with size 10. And insert the elements 59,18,49,58, 21,33 into the list.(E-

indicates empty).

0 1 2 3 4 5 6 7 8 9 E E E E E E E E E E

Insert 59: h 1 (59)=59mod 10= 9 (place 59 at 9th^ location

0 1 2 3 4 5 6 7 8 9 E E E E E E E E E 59

Insert 18: h 1 (18)=18mod 10= 8 (place 18 at 8th^ location).

0 1 2 3 4 5 6 7 8 9 E E E E E E E E 18 59

Insert 49: h 1 (49)=49mod 10= 9 (place 49 at 9th^ location. But element is there so apply D.H).

h 1 (49,T)=(h 1 (49)+1.h 2 (49))mod Where h 2 (49)= R- (x mod R)=(7- 49mod 7)=7-0=7(select R as 7 which is a prime nearer to m )

h 1 (49,T)=(h 1 (49)+1.7))mod10=(9+7) mod 10= 6 (Place 49 at 6th^ position).

0 1 2 3 4 5 6 7 8 9 E E E E E E 49 E 18 59

Algorithm for Double Hashing: Search

int search( key, r ) { int i, inc, last;

i = hash1( key ) ; inc = hash2( key ); last = (i+(n-1)*inc) % m; while ( i!=last && !empty(r[i]) && r[i]!=key ) i = (i+inc) % m; if (r[i]==key) return( i ); else return( - 1 ); } //Where hash1 (key) is the first hash function,hash2(key) is the second hash function

(c) Delete(x,T): This operation deletes the key x from hash table „T‟

Procedure:

 First compute h 1 (x) value and examine slot h 1 (x). Examining a slot is known as a probe.  If slot h 1 (x) contains key x, then delete the element and make that location empty. Otherwise apply Double Hashing to locate the element. After probing also if the element was not found then display that “Deletion is impossible because the element was not in the table”.

Advantages: Distributes keys more uniformly than linear probing

A Comparison of Rehashing Methods

m distinct probe Primary clustering

sequences

m distinct probe No primary clustering;

sequences but secondary clustering

m^2 distinct probe No primary clustering

sequences No secondary clustering

HASHING FUNCTIONS

Choosing a good hashing function, h(k) , is essential for hash-table based searching. h should distribute the elements of our collection as uniformly as possible to the "slots" of the hash table. The key criterion is that there should be a minimum number of collisions.

If the probability that a key, k , occurs in our collection is P(k) , then if there are m slots in our hash table, a uniform hashing function , h(k) , would ensure:

Sometimes, this is easy to ensure. For example, if the keys are randomly distributed in (0, r ], then,

h(k) = floor((mk)/r) will provide uniform hashing.

Mapping keys to natural numbers

Most hashing functions will first map the keys to some set of natural numbers, say (0,r]. There are many ways

to do this, for example if the key is a string of ASCII characters, we can simply add the ASCII representations

of the characters mod 255 to produce a number in (0,255) - or we could xor them, or we could add them in

pairs mod 2^16 - 1, or ...

Having mapped the keys to a set of natural numbers, we then have a number of possibilities.

1. MOD FUNCTION:

h(k) = k mod m.

When using this method, we usually avoid certain values of m. Powers of 2 are usually avoided, for k mod 2 b^ simply selects the b low order bits of k. Unless we know that all the 2 b^ possible values of the lower order bits are equally likely, this will not be a good choice, because some bits of the key are not used in the hash function.

Prime numbers which are close to powers of 2 seem to be generally good choices for m.

For example, if we have 4000 elements, and we have chosen an overflow table organization, but wish to have the probability of collisions quite low, then we might choose m = 4093. (4093 is the largest prime less than 4096 = 2^12 .)

2. MULTIPLICATION METHOD:

o Multiply the key by a constant A , 0 < A < 1, o Extract the fractional part of the product, o Multiply this value by m.

Thus the hash function is:

h(k) = floor(m * (kA - floor(kA)))

In this case, the value of m is not critical and we typically choose a power of 2 so that we can get the following efficient procedure on most digital computers:

o Choose m = 2 p.