Algorithm Design and Analysis: Introduction to Algorithm Design Techniques, Lecture notes of Design and Analysis of Algorithms

Detailed Lecture Notes with Description

Typology: Lecture notes

2017/2018

Uploaded on 01/26/2022

vasundhara-jhobta
vasundhara-jhobta 🇺🇸

4.6

(8)

8 documents

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
15CS204J- ALGORITHM DESIG N AND ANALYSIS UNIT-I!
Ms.$Selva$Mary.$G$ $ Page$1$
UNIT%I%
INTRODUCTION%TO%ALGORITHM%DESIGN%
Introduction$ -$ Fundamentals$ of$ algorithm$(Line$ count,$ operation$ count)$ -$ Algorithm$ Design$ Techniques$
(Approaches,$ Design$ Paradigms)$ -$ Designing$ an$ algorithm$ and$ its$Analysis$(Best,$ Worst$&$Average$case)$ -$
Asymptotic$Notations$based$on$Orders$of$Growth$-$Mathematical$Analysis$-$Induction$-$Recurrence$Relation:$$
Substitution$method,$Recursion$method,$Master's$Theorem.$
INTRODUCTION%
Algorithm$is$a$step$by$step$procedure,$which$defines$a$set$of$instructions$to$be$executed$in$certain$order$to$get$
the$desired$ output.$Algorithms$are$generally$ created$independent$of$underlying$ languages,$ i.e.$an$ algorithm$
can$be$implemented$in$more$than$one$programming$language.$$
An$algorithm$is$defined$as$follows:$$
An$algorithm$is$a$set$of$rules$for$carrying$out$calculation$either$by$hand$or$on$a$machine.$
An$algorithm$is$a$finite$step-by-step$procedure$to$achieve$a$required$result.$
An$algorithm$is$a$sequence$of$computational$steps$that$transform$the$input$into$the$output.$
An$ algorithm$ is$ a$ sequence$ of$ operations$ performed$ on$ data$ that$ have$ to$ be$ organized$ in$ data$
structures.$
An$ algorithm$ is$ an$ abstraction$ of$ a$ program$ to$ be$ executed$ on$ a$ physical$ machine$ (model$ of$
Computation).$
Characteristics%of%an%Algorithm%
Not$all$procedures$can$be$called$an$algorithm.$An$algorithm$should$have$the$below$mentioned$characteristics$
−$
Unambiguous$−$ Algorithm$ should$be$clear$and$ unambiguous.$ Each$ of$its$steps$ (or$ phases),$and$
their$input/outputs$should$be$clear$and$must$lead$to$only$one$meaning.$
Input$−$An$algorithm$should$have$0$or$more$well$defined$inputs.$
Output$−$An$algorithm$should$have$1$or$more$well$defined$outputs,$and$should$match$the$desired$
output.$
Finiteness$−$Algorithms$must$terminate$after$a$finite$number$of$steps.$
Feasibility$−$Should$be$feasible$with$the$available$resources.$
Independent$−$An$algorithm$should$have$step-by-step$directions$which$should$be$independent$of$
any$programming$code.$
$
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download Algorithm Design and Analysis: Introduction to Algorithm Design Techniques and more Lecture notes Design and Analysis of Algorithms in PDF only on Docsity!

UNIT I

INTRODUCTION TO ALGORITHM DESIGN

Introduction - Fundamentals of algorithm (Line count, operation count) - Algorithm Design Techniques (Approaches, Design Paradigms) - Designing an algorithm and its Analysis (Best, Worst & Average case) - Asymptotic Notations based on Orders of Growth - Mathematical Analysis - Induction - Recurrence Relation: Substitution method, Recursion method, Master's Theorem.

INTRODUCTION

Algorithm is a step by step procedure, which defines a set of instructions to be executed in certain order to get the desired output. Algorithms are generally created independent of underlying languages, i.e. an algorithm can be implemented in more than one programming language. An algorithm is defined as follows:

  • An algorithm is a set of rules for carrying out calculation either by hand or on a machine.
  • An algorithm is a finite step-by-step procedure to achieve a required result.
  • An algorithm is a sequence of computational steps that transform the input into the output.
  • An algorithm is a sequence of operations performed on data that have to be organized in data structures.
  • An algorithm is an abstraction of a program to be executed on a physical machine (model of Computation).

Characteristics of an Algorithm

Not all procedures can be called an algorithm. An algorithm should have the below mentioned characteristics −

  • Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or phases), and their input/outputs should be clear and must lead to only one meaning.
  • Input − An algorithm should have 0 or more well defined inputs.
  • Output − An algorithm should have 1 or more well defined outputs, and should match the desired output.
  • Finiteness − Algorithms must terminate after a finite number of steps.
  • Feasibility − Should be feasible with the available resources.
  • Independent − An algorithm should have step-by-step directions which should be independent of any programming code.

ALGORITHM DESIGN TECHNIQUES (APPROACHES, DESIGN PARADIGMS)

General approaches to the construction of efficient solutions to problems. Such methods are of interest because:

  • They provide templates suited to solving a broad range of diverse problems.
  • They can be translated into common control and data structures provided by most high-level languages.
  • The temporal and spatial requirements of the algorithms which result can be precisely analyzed. Although more than one technique may be applicable to a specific problem, it is often the case that an algorithm constructed by one approach is clearly superior to equivalent solutions built using alternative techniques.
  1. Brute Force Brute force is a straightforward approach to solve a problem based on the problem’s statement and definitions of the concepts involved. It is considered as one of the easiest approach to apply and is useful for solving small – size instances of a problem. Some examples of brute force algorithms are:
  • Computing a n^ (a > 0, n a nonnegative integer) by multiplying aa…*a
  • Computing n!
  • Selection sort , Bubble sort
  • Sequential search
  • Exhaustive search: Traveling Salesman Problem, Knapsack problem.
  1. Divide-and-Conquer, Decrease-and-Conquer These are methods of designing algorithms that (informally) proceed as follows: Given an instance of the problem to be solved, split this into several smaller sub-instances (of the same problem), independently solve each of the sub-instances and then combine the sub-instance solutions so as to yield a solution for the original instance. With the divide-and-conquer method the size of the problem instance is reduced by a factor (e.g. half the input size), while with the decrease-and-conquer method the size is reduced by a constant. Examples of divide-and-conquer algorithms:
  • Computing an (a > 0, n a nonnegative integer) by recursion
  • Binary search in a sorted array (recursion)
  • Mergesort algorithm, Quicksort algorithm (recursion)
  • The algorithm for solving the fake coin problem (recursion)

It is used when the solution can be recursively described in terms of solutions to subproblems (optimal substructure). Algorithm finds solutions to subproblems and stores them in memory for later use. More efficient than “brute-force methods”, which solve the same subproblems over and over again.

  • Optimal substructure: Optimal solution to problem consists of optimal solutions to subproblems
  • Overlapping subproblems: Few subproblems in total, many recurring instances of each
  • Bottom up approach: Solve bottom-up, building a table of solved subproblems that are used to solve larger ones. Examples:
  • Fibonacci numbers computed by iteration.
  • Warshall’s algorithm implemented by iterations
  1. Backtracking methods The method is used for state-space search problems. State-space search problems are problems, where the problem representation consists of:
  • initial state
  • goal state(s)
  • a set of intermediate states
  • a set of operators that transform one state into another. Each operator has preconditions and post conditions.
  • a cost function – evaluates the cost of the operations (optional)
  • a utility function – evaluates how close is a given state to the goal state (optional) The solving process solution is based on the construction of a state-space tree, whose nodes represent states, the root represents the initial state, and one or more leaves are goal states. Each edge is labeled with some operator. If a node b is obtained from a node a as a result of applying the operator O, then b is a child of a and the edge from a to b is labeled with O. The solution is obtained by searching the tree until a goal state is found.

Backtracking uses depth-first search usually without cost function. The main algorithm is as follows:

  1. Store the initial state in a stack
  2. While the stack is not empty, do: a. Read a node from the stack. b. While there are available operators do: i. Apply an operator to generate a child ii. If the child is a goal state – stop iii. If it is a new state, push the child into the stack The utility function is used to tell how close is a given state to the goal state and whether a given state may be considered a goal state. If no children can be generated from a given node, then we backtrack – read the next node from the stack.
  3. Branch-and-bound Branch and bound is used when we can evaluate each node using the cost and utility functions. At each step we choose the best node to proceed further. Branch-and bound algorithms are implemented using a priority queue. The state-space tree is built in a breadth-first manner. Example: the 8-puzzle problem. The cost function is the number of moves. The utility function evaluates how close is a given state of the puzzle to the goal state, e.g. counting how many tiles are not in place.

ALGORITHM ANALYSIS

An algorithm is said to be efficient and fast, if it takes less time to execute and consumes less memory space. The performance of an algorithm is measured on the basis of following properties:

**1. Time Complexity

  1. Space Complexity** Suppose X is an algorithm and n is the size of input data, the time and space used by the Algorithm X are the two main factors which decide the efficiency of X.
  • Time Factor − The time is measured by counting the number of key operations such as comparisons in sorting algorithm
  • Space Factor − The space is measured by counting the maximum memory space required by the algorithm. The complexity of an algorithm f(n) gives the running time and / or storage space required by the algorithm in terms of n as the size of input data.

There are four rules to count the operations: Rule 1: for loops - the size of the loop times the running time of the body The running time of a for loop is at most the running time of the statements inside the loop times the number of iterations. for( i = 0; i < n; i++) sum = sum + i; a. Find the running time of statements when executed only once: The statements in the loop heading have fixed number of operations, hence they have constant running time O(1) when executed only once. The statement in the loop body has fixed number of operations, hence it has a constant running time when executed only once. b. Find how many times each statement is executed. for( i = 0; i < n; i++) // i = 0; executed only once: O(1) // i < n; n + 1 times O(n) // i++ n times O(n) // total time of the loop heading: // O(1) + O(n) + O(n) = O(n) sum = sum + i; // executed n times, O(n) The loop heading plus the loop body will give: O(n) + O(n) = O(n). Loop running time is: O(n) Mathematical analysis of how many times the statements in the body are executed If a) the size of the loop is n (loop variable runs from 0, or some fixed constant, to n) and b) the body has constant running time (no nested loops) then the time is O(n)

Rule 2: Nested loops – the product of the size of the loops times the running time of the body The total running time is the running time of the inside statements times the product of the sizes of all the loops sum = 0; for( i = 0; i < n; i++) for( j = 0; j < n; j++) sum++; Applying Rule 1 for the nested loop (the ‘j’ loop) we get O(n) for the body of the outer loop. The outer loop runs n times, therefore the total time for the nested loops will be O(n) * O(n) = O(n*n) = O(n^2) Analysis What happens if the inner loop does not start from 0? sum = 0; for( i = 0; i < n; i++) for( j = i; j < n; j++) sum++;

Rule 3: Consecutive program fragments The total running time is the maximum of the running time of the individual fragments sum = 0; for( i = 0; i < n; i++) sum = sum + i; sum = 0; for( i = 0; i < n; i++) for( j = 0; j < 2*n; j++) sum++; The first loop runs in O(n) time, the second - O(n2) time, the maximum is O(n2) Rule 4: If statement if C S1; else S2; The running time is the maximum of the running times of S1 and S2. Summary Steps in analysis of non-recursive algorithms:

  • Decide on parameter n indicating input size
  • Identify algorithm’s basic operation
  • Check whether the number of time the basic operation is executed depends on some additional property of the input. If so, determine worst, average, and best case for input of size n
  • Count the number of operations using the rules above.

ASYMPTOTIC NOTATIONS

The main idea of asymptotic analysis is to have a measure of efficiency of algorithms that doesn’t depend on machine specific constants, and doesn’t require algorithms to be implemented and time taken by programs to be compared. Asymptotic notations are mathematical tools to represent time complexity of algorithms for asymptotic analysis. The following 3 asymptotic notations are mostly used to represent time complexity of algorithms. 1) Θ Notation: The theta notation bounds a function from above and below, so it defines exact asymptotic behavior. A simple way to get Theta notation of an expression is to drop low order terms and ignore leading constants. For example, consider the following expression. 3 #$^ + 6 #'^ + 6000 = *(#$)

Dropping lower order terms is always fine because there will always be a n0 after which *(#$) beats (#') irrespective of the constants involved. For a given function g(n), we denote Θ(g(n)) is following set of functions. ((-(#)) = {/(#): 1 ℎ 343 35671 897616:3 ;9#71<#17 ; 1 , ; 2 <#@ # 0 7A;ℎ 1 ℎ< 0 <= ; 1 ∗ - (#) <= /(#) <= ; 2 ∗ - (#) /94 = # 0 } The above definition means, if f(n) is theta of g(n), then the value f(n) is always between c1g(n) and c2g(n) for large values of n (n >= n0). The definition of theta also requires that f(n) must be non-negative for values of n greater than n0.

2. Big O Notation: The Big O notation defines an upper bound of an algorithm, it bounds a function only from above. For example, consider the case of Insertion Sort. It takes linear time in best case and quadratic time in worst case. We can safely say that the time complexity of Insertion sort is O(#'). Note that O(#') also covers linear time. If we use Θ notation to represent time complexity of Insertion sort, we have to use two statements for best and worst cases:

  1. The worst case time complexity of Insertion Sort is Θ(#').
  2. The best case time complexity of Insertion Sort is Θ(n). The Big O notation is useful when we only have upper bound on time complexity of an algorithm. Many times we easily find an upper bound by simply looking at the algorithm. G(-(#)) = { /(#): 1 ℎ 343 35671 897616:3 ;9#71<# 17 ; <#@ # 0 7A;ℎ 1 ℎ< 0 <= /(#) <= ;-(#) /94 = # 0 }

Linear and Binary search

An algorithm is a step-by-step procedure or method for solving a problem by a computer in a given number of steps. The steps of an algorithm may include repetition depending upon the problem for which the algorithm is being developed. The algorithm is written in human readable and understandable form. To search an element in a given array, it can be done in two ways linear search and Binary search. Linear Search A linear search is the basic and simple search algorithm. A linear search searches an element or value from an array till the desired element or value is not found and it searches in a sequence order. It compares the element with all the other elements given in the list and if the element is matched it returns the value index else it return - 1. Linear Search is applied on the unsorted or unordered list when there are fewer elements in a list. Pseudocode:-

Input: Array D, integer key

Output: first index of key in D, or - 1 if not found

For i = 0 to last index of D: if D[i] == key: return i return - 1 Example with Implementation To search the element 5 it will go step by step in a sequence order. linear(a[n], key) for( i = 0; i < n; i++) if (a[i] == key) return i; return - 1; Asymptotic Analysis Worst Case Analysis (Usually Done) In the worst case analysis, we calculate upper bound on running time of an algorithm. We must know the case that causes maximum number of operations to be executed. For Linear Search, the worst case happens when the element to be searched (target in the above code) is not present in the array. When target is not present, the search() functions compares it with all the elements of array one by one. Therefore, the worst case time complexity of linear search would be Θ(n).

Average Case Analysis (Sometimes done) In average case analysis, we take all possible inputs and calculate computing time for all of the inputs. Sum all the calculated values and divide the sum by total number of inputs. We must know (or predict) distribution of cases. For the linear search problem, let us assume that all cases are uniformly distributed (including the case of target not being present in array). The key is equally likely to be in any position in the array If the key is in the first array position: 1 comparison If the key is in the second array position: 2 comparisons ... If the key is in the ith postion: i comparisons ... So average all these possibilities: (1+2+3+...+n)/n = [n(n+1)/2] /n = (n+1)/2 comparisons. The average number of comparisons is (n+1)/2 = Θ(n). Best Case Analysis (Bogus) In the best case analysis, we calculate lower bound on running time of an algorithm. We must know the case that causes minimum number of operations to be executed. In the linear search problem, the best case occurs when Target is present at the first location. The number of operations in the best case is constant (not dependent on n). So time complexity in the best case would be Θ(1) Binary Search Binary Search is applied on the sorted array or list. In binary search, we first compare the value with the elements in the middle position of the array. If the value is matched, then we return the value. If the value is less than the middle element, then it must lie in the lower half of the array and if it's greater than the element then it must lie in the upper half of the array. We repeat this procedure on the lower (or upper) half of the array. Binary Search is useful when there are large numbers of elements in an array. binarysearch(a[n], key, low, high) while(low key) high=mid-1; else low=mid+1; } return - 1;

Sorting takes place by stepping through all the data items one-by-one in pairs and comparing adjacent data items and swapping each pair that is out of order. Bubble Sort for Data Structures Sorting using Bubble Sort Algorithm Let's consider an array with values {5, 1, 6, 2, 4, 3} int a[6] = {5, 1, 6, 2, 4, 3}; int i, j, temp; for(i=0; i<6; i++) { for(j=0; j<6-i-1; j++) { if( a[j] > a[j+1]) { temp = a[j]; a[j] = a[j+1]; a[j+1] = temp; } } } //now you can print the sorted array after this Above is the algorithm, to sort an array using Bubble Sort. Although the above logic will sort and unsorted array, still the above algorithm isn't efficient and can be enhanced further. Because as per the above logic, the for loop will keep going for six iterations even if the array gets sorted after the second iteration. Hence we can insert a flag and can keep checking whether swapping of elements is taking place or not. If no swapping is taking place that means the array is sorted and wew can jump out of the for loop.

int a[6] = {5, 1, 6, 2, 4, 3}; int i, j, temp; for(i=0; i<6; i++) { int flag = 0; //taking a flag variable for(j=0; j<6-i-1; j++) { if( a[j] > a[j+1]) { temp = a[j]; a[j] = a[j+1]; a[j+1] = temp; flag = 1; //setting flag as 1, if swapping occurs } } if(!flag) //breaking out of for loop if no swapping takes place { break; } } In the above code, if in a complete single cycle of j iteration(inner for loop), no swapping takes place, and flag remains 0, then we will break out of the for loops, because the array has already been sorted. Complexity Analysis of Bubble Sorting In Bubble Sort, n-1 comparisons will be done in 1st pass, n-2 in 2nd pass, n-3 in 3rd pass and so on. So the total number of comparisons will be (# − 1 ) + (# − 2 ) + (# − 3 )+..... + 3 + 2 + 1 RAS = #(# − 1 )/ 2

  1. 3 G(# 2 ) Hence the complexity of Bubble Sort is O(n^2 ). The main advantage of Bubble Sort is the simplicity of the algorithm. Space complexity for Bubble Sort is O(1) , because only single additional memory space is required for temp variable Best-case Time Complexity will be O(n) , it is when the list is already sorted.

Sorting using Insertion Sort Algorithm int a[6] = {5, 1, 6, 2, 4, 3}; int i, j, key; for(i=1; i<6; i++) { key = a[i]; j = i-1; while(j>=0 && key < a[j]) { a[j+1] = a[j]; j--; } a[j+1] = key; } Now lets, understand the above simple insertion sort algorithm. We took an array with 6 integers. We took a variable key, in which we put each element of the array, in each pass, starting from the second element, that is a[1]. Then using the while loop, we iterate, until j becomes equal to zero or we find an element which is greater than key, and then we insert the key at that position. In the above array, first we pick 1 as key, we compare it with 5(element before 1), 1 is smaller than 5, we shift 1 before 5. Then we pick 6, and compare it with 5 and 1, no shifting this time. Then 2 becomes the key and is compared with, 6 and 5, and then 2 is placed after 1. And this goes on, until complete array gets sorted. Complexity Analysis of Insertion Sorting

  • Worst Case Time Complexity : O(n^2)
  • Best Case Time Complexity : O(n)
  • Average Time Complexity : O(n^2)
  • Space Complexity : O(1)

MATHEMATICAL ANALYSIS

A recurrence is a recursive description of a function, or in other words, a description of a function in terms of itself. Like all recursive structures, a recurrence consists of one or more base cases and one or more recursive cases. Each of these cases is an equation or inequality, with some function value / (#) on the left side. The base cases give explicit values for a (typically finite, typically small) subset of the possible values of n. The recursive cases relate the function value / (#) to function value / (P) for one or more integers P < #; typically, each recursive case applies to an infinite number of possible values of n. For example, the following recurrence (written in two different but standard ways) describes the identity function / (#) = #: / (#) =

/ # − 1 + 1 91 ℎ34T673 / (#) = / (# − 1 ) + 1 /94 0

In both presentations, the first line is the only base case, and the second line is the only recursive case. The same function can satisfy many different recurrences; for example, both of the following recurrences also describe the identity function: We say that a particular function satisfies a recurrence, or is the solution to a recurrence, if each of the statements in the recurrence is true. Most recurrences—at least, those that we will encounter in this class— have a solution; moreover, if every case of the recurrence is an equation, that solution is unique. Specifically, if we transform the recursive formula into a recursive algorithm, the solution to the recurrence is the function computed by that algorithm! Mathematical Analysis – Induction Consider a recursive algorithm to compute the maximum element in an array of integers. You may assume the existence of a function “S<5(<, V) ” that returns the maximum of two integers a and b. Algorithm: Finding the maximum in an array of n elements XA#;169# XYZ[ − ]]^ − _` (\ , # ) 1 : 6/ (# = 1 ) 1 ℎ3# 2 : 431A4# ([ 1 ] ) 3 : 3D 4 : 431A4# (S<5 ([#] , XYZ[ − ]]^ − _` (\ , # − 1 ) )) 5 : 3#@ 6/