




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An analysis of algorithms, focusing on the sorting problem and its solutions, including insertion sort. Observations on running time, experimental hypotheses, and predictions. It also discusses measuring running time and estimating the running time of algorithms.
Typology: Slides
1 / 8
This page cannot be seen from the preview
Don't miss anything!





2
Charles Babbage ( 1864 )
Analytic Engine (schematic) 3
Analysis of algorithms. Framework for comparing algorithms and predicting performance. Scientific method. ! Observe some feature of the universe. ! Hypothesize a model that is consistent with observation. ! Predict events using the hypothesis. ! Verify the predictions by making further observations. ! Validate the theory by repeating the previous steps until the hypothesis agrees with the observations. Universe = computer itself. 4
Sorting problem: ! Given N items, rearrange them in ascending order. ! Applications: statistics, databases, data compression, computational biology, computer graphics, scientific computing, ... Hanley Haskell Hauser Hayes Hong Hornet Hsu Hauser Hong Hsu Hayes Haskell Hanley Hornet
5 Insertion sort. ! Brute-force sorting solution. ! Move left-to-right through array. ! Exchange next element with larger elements to its left, one-by-one. Insertion Sort
6 Insertion Sort: Observation Observe and tabulate running time for various values of N. ! Data source: N random numbers between 0 and 1. 40 , 000 400 million 20 , 000 99 million 10 , 000 25 million 5 , 000 6. 2 million N Comparisons 80 , 000 16 million 7 Data analysis. Plot # comparisons vs. input size on log-log scale. Regression. Fit line through data points! a Nb. Hypothesis. # comparisons grows quadratically with input size! N^2 /4. Insertion Sort: Experimental Hypothesis slope 8 Insertion Sort: Prediction and Verification Experimental hypothesis. # comparisons! N^2 /4. Prediction. 4 00 million comparisons for N = 40,000. Observations. Prediction. 1 0 billion comparisons for N = 200,000. Observation. 200 , 000 9. 997 billion N Comparisons 40 , 000 399. 7 million 40 , 000 401. 6 million 40 , 000 400. 0 million N Comparisons 40 , 000 401. 3 million
13 Data analysis. Plot time vs. input size on log-log scale. Regression. Fit line through data points! a Nb. Hypothesis. Running time grows quadratically with input size. Insertion Sort: Experimental Hypothesis 14 Timing in Java Wall clock. Measure time between beginning and end of computation. ! Manual: Skagen wristwatch. ! Automatic: Stopwatch.java library.
public class Stopwatch { private static long start; public static void tic() { start = System.currentTimeMillis(); } public static double toc() { long stop = System.currentTimeMillis(); return (stop - start) / 1000. 0 ; } } 15 Measuring Running Time Factors that affect running time. ! Machine. ! Compiler. ! Algorithm. ! Input data. More factors. ! Caching. ! Garbage collection. ! Just-in-time compilation. ! CPU used by other processes. Bottom line. Often hard to get precise measurements. 16 Summary Analysis of algorithms. Framework for comparing algorithms and predicting performance. Scientific method. ! Observe some feature of the universe. ! Hypothesize a model that is consistent with observation. ! Predict events using the hypothesis. ! Verify the predictions by making further observations. ! Validate the theory by repeating the previous steps until the hypothesis agrees with the observations. Remaining question. How to formulate a hypothesis?
Robert Sedgewick and Kevin Wayne • Copyright © 2005 • http://www.Princeton.EDU/~cos 226
18
Worst case running time. Obtain bound on running time of algorithm on any input of a given size N. ! Generally captures efficiency in practice. ! Draconian view, but hard to find effective alternative. Average case running time. Obtain bound on running time of algorithm on random input as a function of input size N. ! Hard to accurately model real instances by random distributions. ! May perform poorly on other distributions. Amortized running time. Worst-case bound on running time of any sequence of N operations. 19
Total running time: sum of cost " frequency for all of the basic ops. ! Cost depends on machine, compiler. ! Frequency depends on algorithm, input. Cost for sorting. ! A = # exchanges. ! B = # comparisons. ! Cost on a typical machine = 1 1A + 4B. Frequency of sorting ops. ! N = # elements to sort. ! Selection sort: A = N-1, B = N(N-1)/2. Donald Knuth 1974 Turing Award 20 An easier alternative. (i) Analyze asymptotic growth as a function of input size N. (ii) For medium N, run and measure time. (iii) For large N, use (i) and (ii) to predict time. Asymptotic growth rates. ! Estimate as a function of input size N.
25 Logarithmic Time Logarithmic time. Running time is O(log N). Searching in a sorted list. Given a sorted array of items, find index of query item. O(log N) solution. Binary search.
26 Linear Time Linear time. Running time is O(N). Find the maximum. Find the maximum value of N items in an array.
27 Linearithmic Time Linearithmic time. Running time is O(N log N). Sorting. Given an array of N elements, rearrange in ascending order. O(N log N) solution. Mergesort. [stay tuned] Remark. $(N log N) comparisons required. [stay tuned] 28 Quadratic Time Quadratic time. Running time is O(N^2 ). Closest pair of points. Given N points in the plane, find closest pair. O(N^2 ) solution. Enumerate all pairs of points. Remark. $(N^2 ) seems inevitable, but this is just an illusion.
29 Exponential Time Exponential time. Running time is O(aN) for some constant a > 1. Finbonacci sequence: 1 1 2 3 5 8 13 21 34 55 … O(%N) solution. Spectacularly inefficient! Efficient solution.
!
! F ( N ) =^ "^ N 5
$^ % & '^ (^. nearest integer function 30 Summary of Common Hypotheses When N doubles, Complexity Description running time 2 N Exponential algorithm is not usually practical. squares! N^2 Q reulaatdirvaetliyc samlgaollr iptrhombl^ eprmasc.tical^ for^ use^ only^ on quadruples 1 Constant algorithm is independent of input size. does not change increases by a constant Logarithmic algorithm gets slightly slower as N log N grows. N^ L Nin ienapru^ taslg.orithm^ is^ optimal^ if^ you^ need^ to^ process doubles slightly more than N log N Linearithmic algorithm scales to huge problems. doubles