Algorithm Analysis: Determining Algorithm Running Time and Space Complexity, Study notes of Computer Science

An introduction to algorithm analysis, focusing on determining the running time and space complexity of algorithms. It covers various time complexities, including constant, logarithmic, log-squared, linear, n log n, quadratic, cubic, and exponential functions. The document also includes examples of algorithms and their respective running times for small and moderate input sizes.

Typology: Study notes

Pre 2010

Uploaded on 11/08/2009

koofers-user-x87
koofers-user-x87 🇺🇸

9 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
COP 3503 – Computer Science II – CLASS NOTES - DAY #3
Algorithm Analysis
Algorithm - a clearly specified set of instructions that the computer will follow to solve
a problem.
Algorithm Analysis - determining the amount of resources that the algorithm will
require, typically in terms of time and space.
Areas of study include:
Estimation techniques for determining the running time of an algorithm.
Techniques to reduce the running time of an algorithm.
Mathematical framework for the accurate determination of the running time of an
algorithm.
Algorithm Analysis
The running time of an algorithm is a function of the size of the input. Example: It
takes longer to sort 1000 numbers than it does to sort 10 numbers.
The value of this function depends upon many things including:
1. The speed of the host computer.
2. The size of the host computer.
3. The compilation process (quality of the compiler generated code).
4. The quality of the original source code which implements the algorithm.
Illustration of running time vs input size for small input sets (Figure 5.1)
linear
time N log N
quadratic
cubic
Day 3 - 1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Algorithm Analysis: Determining Algorithm Running Time and Space Complexity and more Study notes Computer Science in PDF only on Docsity!

COP 3503 – Computer Science II – CLASS NOTES - DAY

Algorithm Analysis Algorithm - a clearly specified set of instructions that the computer will follow to solve a problem. Algorithm Analysis - determining the amount of resources that the algorithm will require, typically in terms of time and space. Areas of study include:  Estimation techniques for determining the running time of an algorithm.  Techniques to reduce the running time of an algorithm.  Mathematical framework for the accurate determination of the running time of an algorithm. Algorithm Analysis  The running time of an algorithm is a function of the size of the input. Example: It takes longer to sort 1000 numbers than it does to sort 10 numbers.  The value of this function depends upon many things including:

  1. The speed of the host computer.
  2. The size of the host computer.
  3. The compilation process (quality of the compiler generated code).
  4. The quality of the original source code which implements the algorithm. Illustration of running time vs input size for small input sets (Figure 5.1) linear time N log N quadratic cubic

input size When comparing two functions F(N) and G(N), it does not make sense to state that: F < G, F = G, or G < F. Example: At some point x , F may be smaller than G, yet at some other point y , F may be equal to or greater than G. Instead, the growth rates of the functions need to be determined. Definitions (based on the growth rate of the function):

  1. constant function – function whose dominant term is a constant (c)
  2. logarithmic func. – dominant term is log N
  3. log-squared func. – dominant term is log^2 N
  4. linear func. – dominant term is N
  5. N log N func. – dominant term is N log N
  6. quadratic func. – dominant term is N 2
  7. cubic func. – dominant term is N 3
  8. exponential func. – dominant term is 2 N There is a three-fold reason for basing our analysis on the growth rate of the function rather than its specific value at some point:
  9. For sufficiently large values of N, the value of the function is primarily determined by its dominant term (sufficiently large varies by function). Example: Consider the cubic function where the function is expressed by 15N^3 + 20N^2 - 10N + 4. For large values of N, say 1000, the value of this function is: 15,019,990,004 of which 15,000,000,000 is due entirely to the N^3 term. Using only the N^3 term to estimate the value of this function introduces an error of only 0.1% which is typically close enough for estimation purposes.
  10. Constants associated with the dominant term are usually not meaningful across different machines (maybe though for identically growing functions).
  11. Small values of N are generally not important. Big-Oh Notation  Used to represent the growth rate of a function.  Allows algorithm designers to establish a relative order among functions by comparison of their dominant terms.  Denoted as O(N^2 ), read as "order N squared".  Examine this notation and related notation more formally a bit later. (section 5.4).

Algorithm Assumptions: an array named a that holds the values and is of size SIZE. min = a[0] for ( i = 1; i < SIZE; i++) if( a[i] < min) min = a[i] return min Analysis Makes N-1 iterations of the loop and is therefore O(N). Examples of Algorithm Running Times We will look at algorithms for four different problems:

  1. Minimum Element in an Array - given an array of N items, find the smallest item. This is problem can be solved with the following algorithm: maintain a single variable called minimum in which the value of the smallest element yet seen is stored. Initialize this variable to the value of the first element in the array. Make a single sequential pass through the array and change the value of minimum whenever a value smaller than the value of minimum is encountered. The running time of this algorithm will be O(N). Reason: the same "work" will be done for each and every item in the array of N items. A better algorithm is not possible since each item in the array must be examined one time.
  2. Closest Points in the Plane - given N points in a plane (an x-y coordinate system), find the pair of points that are closest together. The algorithm for solving this problem requires that the distance between all possible pairs of points in the plane be calculated and store the smallest value (i.e., remember the minimum distance). Note that there are N(N-1)/2 pairs of points or "on the order of N^2 " pairs of points. [N points can be paired with (N-1) points for a total of N(N-1) pairs - this however double counts pairs like (A,B) and (B,A) which are the same pair so the total number is divided in half] Calculating all of the pairs of points and storing the minimum distance will

Algorithm Assumptions: an array holding all of the points in the plane called a , of size SIZE. A function dist(a,b) that returns the distance between any two points a and b , this function is O(1). x = 0 //first of the pair y = 0 //last of the pair min = dist(a[0], a[1]) for(i =0; i < SIZE; i++) { for(j=1; j<SIZE+1; j++) { if(i == j) skip to next j if(dist(a[i], a[j%SIZE]) < min) { min = dist(a[i],a[j%SIZE]) x = i y = j%SIZE } } } Analysis Makes N(N-1) = N 2

  • N total iterations and is thus O(N 2 ). require quadratic time (O(N^2 )). [Note: there are algorithms for this problem which are O(NlogN) and O(N) but they are beyond the scope of our analysis.] The distance between two points in an x-y coordinate system of the form (xi, yi) and (xj, yj) is given by the formula: Shown on the next page are two algorithms which solve this problem. 2 i j 2 (x (^) i x j) (y y)

Example – Better Algorithm – Closest Points in a Plane 4 3 Let a = y 2 a[0] a[1] a[2] a[3] 1 0 1 2 3 4 5 x Assume that a is an array (1dimension) that holds the points from the plane. Assume that the function dist(r, s) returns the distance between two points r and s. Assume that the constant SIZE is the number of elements in the array of points a. Operation of the Algorithm for the Example Case Recall that: distance (a[0], a[1]) = 2 distance (a[0], a[2]) = 51/ distance (a[0], a[3]) = 3 distance (a[1], a[2]) = 5 1/ distance (a[1], a[3]) = 13 1/ distance (a[2], a[3]) = 2 1/ note: the first algorithm would have also calculated distances for the following pairs of points: (a[1],a[0]), (a[2], a[0]), (a[3],a[0]), (a[2],a[1]), (a[3],a[1]), and (a[3],a[2]) or twice as many points. However, realize that the distance from a[0] to a[1] is exactly the same as the distance from a[1] to a[0]! outer loop iteration #1: i = 0 inner loop iterates from 1 to 3 for this outer iteration checks distances for the following point pairs: (a[0], a[1]), (a[0], a[2]), (a[0], a[3]) {minimum = 2, x = 0, y = 1} outer loop iteration #2: i = 1 inner loop iterates from 2 to 3 for this outer iteration checks distances for the following point pairs: (a[1], a[2]), (a[1], a[3]) {minimum = 2, x = 0, y = 1} outer loop iteration #3: i = 2 inner loop iterates from 3 to 3 for this outer iteration checks distances for the following point pairs: (a[2], a[3])

2 i j 2 (x (^) i x j) (y y)

{minimum = 21/2, x = 2, y = 3} outer loop terminates minimum returned as 2 1/ with the closest pair of points being a[2] and a[3]

  1. Colinear Points in a Plane - given N points in a plane, determine if any three form a straight line. This problem suffers from the fact that the existence of colinear points in the plane introduces a degenerate case that requires special handling. Direct solution requires the enumeration of all sets of three points from the plane. The number of different groups of three points is N(N-1)(N-2)/6. This yields a dominate term which is on the order of N^3 (a cubic function). This algorithm is even more computationally expensive than the previous quadratic algorithm. [Note: there is a quadratic time algorithm for this problem.] [Note: three items can be grouped together in 6 different ways, consider (a,b,c) which can be grouped as (a,b,c), (a,c,b), (b,a,c), (b,c,a), (c,a,b), or (c,b,a) which all represent the same sequence of points – so the number of different groups is divided by 6.
  2. Sorted List Matching Problem – given two sorted lists of names, output the names common to both lists. Perhaps the standard way to attack this problem is the following: For each name on list #1, do the following: a) Search for the current name in list #2. b) If the name is found, output it. If a list is unsorted, steps a and b may take O(n) time. Can you tell me why? BUT, we know that both lists are already sorted. Thus we can use a binary search in step a. From CS1, we learned that this takes O(log n) time, where n is the total number of names in the list. For the moment, if we assume that both lists are of equal size, then we can safely say that the size of list #2 is about ½ the total input size, so technically, our search would take O(log n/2) time, where n is the TOTAL SIZE of our input to the problem. Using our log rules however, we find that log 2 n = (log 2 n/2) + 1. Thus, it’s fairly safe to assume for large n that our running time is simply O(log 2 n). Now, that is simply the running time for 1 loop iterations. But how many loop iterations are there? (Assume that there are n/2 names on each list, again, where n is the TOTAL SIZE of the input.) Under our assumption, there will be n/2 loop iterations, so our total running time would be O(n log 2 n). Why did I not divide the expression in the Big-O by 2?

The maximum number of iterations then, would be the total number of names on both lists, which is n, using our previous interpretation. For each iteration, we are doing a constant amount of work. (Essentially a comparison, and sometimes outputting a single name...) Thus, our algorithm runs in O(n) time – an improvement over our previous algorithm. A final question one must ask is, can we solve this question in even less time? If yes, what is such an algorithm, if no, how can we prove it? Our proof goes along these lines: In order to have an accurate list, we must read every name on one of the two lists. If we skip names on BOTH lists, we can NOT deduce whether we would have matches between those names or not. In order to simply “read” all the names on one list, we would take O(n/2) time. But, in order notation, this is still O(n), the running time of our second algorithm. Thus, we know we can not do better in terms of time, (within a constant factor), of our second algorithm.